Help needed with System V message queues
George Bogatko
bogatko at lzga.ATT.COM
Thu Aug 9 02:36:56 AEST 1990
HI:
A while ago, I got sick of wondering what the tunables for messages meant.
The result was a writeup which was begun, interrupted, and never
finished. This is what remains of it. It is 500+ lines long, so you
may want to skip all this if message queues leave you cold. After it,
in a second posting is a 'pic' file of how messages are stored in memory.
This may answer some questions about message queues, and how they work.
If I see enough interest, I may finish the writeup.
************
OVERVIEW
In /etc/master.d there is a file, called msg that contains
the tunable parameters for the device driver that handles
the UNIXTM message queue system. This file contains lines
similar to the following:
MSGMAP = 100
MSGMAX = 2048
MSGMNB = 4096
MSGMNI = 50
MSGSSZ = 8
MSGTQL = 40
MSGSEG = 1024
The meaning of these parameters, as stated in the manual
Operations and Administration Series: Performance Mangement
under the chapter Tunable Parameters Definitions is:
MSGMAP specifies the size of the memory control map
used to manage message segments. If this
value is insufficient to handle the message
type facilities, a warning message is sent ot
the console.
MSGMAX specifies in bytes the maximum size of a
message sent. When receiving a message, a
value larger than this parameter can be used
to ensure that the whole message is received
and not truncated.
MSGMNB specifies the maximum length, in bytes of a
message queue. The owner of a facility can
lower this value, but only the superuser can
raise it.
MSGSEG Specifies the number of message segments in
the system. MSGSEG * MSGSSZ should be less
than 131,072 bytes (128 kilobytes).
MSGSSZ specifies the size in bytes of a message
segment as stored in memory. Each message is
stored in a contiguous message segment. The
larger the segments are, the greater the
chance of having wasted memory at the end of
each message. MSGSSZ * MSGSEG should be less
than 131,072 bytes (128 kilobytes).
MSGTQL specifies the number of message queue headers
on all message queues sytem-wide, and thus,
the number of outstanding messages.
These values are stored in a structure called "struct msginfo"
which looks like this:
struct msginfo {
int msgmap,
msgmax,
msgmnb,
msgmni,
msgssz,
msgtql;
ushort msgseg;
};
This structure is used when the MSG driver is initialized.
Notice that the values are stored as INTS. This bites you
in the butt later on when the value of 'msgmnb' is put
into 'msg_qbytes'.
All the structures described here are found in '/usr/include/sys/msg.h'
There are four data structures used in the message queue
universe.
struct msgbuf
This is actually a template used by the user to hold
the message; it is assumed (incorrectly) that the user
knows enough to re-write this template to suit their
needs. The structure given in the header file msg.h is
never used. The only thing that must be here is the
member "long mtype". which must always be the first
member. The template looks like this:
struct msgbuf {
long mtype; /* message type */
char mtext[1]; /* message text */
};
Notice that the size of mtext is one (1). This is not
enough size to hold anything useful.
Newcomers to UNIXTM messages almost always bark their
shins on this one.
struct msg
This is the structure that points to the address of the
actual message in the message pool. It looks like
this:
struct msg {
struct msg *msg_next;
long msg_type;
short msg_ts;
short msg_spot;
};
The message queue driver keeps these structures, called
message headers in an array, whose size is determined
by the tunable parameter MSGTQL. Each outstanding
message, i.e. a message that has been sent but not yet
received, has one of these headers associated with it.
Thus the number of outstanding messages handled by the
driver is determined by the setting of MSGTQL.
The members of the structure are used as:
msg_next
Even though these headers are in an array, they
are handled like a linked list. This member
points to the next header, which is located
somewhere in the array.
msg_type
This corresponds to the long mtype member of the
msgbuf structure shown above.
msg_ts
This is the precise length in bytes of the message
that is stored in the message pool.
msg_spot
This is the location in the message pool of the
message. Notice that it is not a pointer. It is
really an offset.
struct msqid_ds
This is the structure that keeps vital statistics about
a particular message queue that is being serviced by
the driver. It looks like this:
struct msqid_ds {
struct ipc_perm msg_perm
struct msg *msg_first;
struct msg *msg_last;
ushort msg_cbytes;
ushort msg_qnum
ushort msg_qbytes;
ushort msg_lspid;
ushort msg_lrpid;
time_t msg_stime;
time_t msg_rtime;
time_t msg_ctime;
};
The driver keeps these structures in an array, whose
size is determined by the tunable parameter MSGMNI.
Thus the maximum number of message queues handled by
the driver is determined by the setting of MSGMNI.
The members of the structure are used as:
msg_perm
This is a structure located in ipc.h that contains
various permissions and ids. It looks like this:
struct ipc_perm {
ushort uid; /* owner's user id */
ushort gid; /* owner's group id */
ushort cuid; /* creator's user id */
ushort cgid; /* creator's group id */
ushort mode; /* access modes */
ushort seq; /* slot usage sequence number */
key_t key; /* key */
};
This structure is used by all the drivers in the
IPC system. The meaning of these variables is
clear from their names, and the comments; except
for seq.
This variable holds a sequence number that is used
to determine the msqid that is returned from the
msgget() call. By constantly incrementing this
number, one can be sure that the same message
queue header will not have the same msqid returned
when it is re-allocated.
msg_first
This is a pointer to the first struct msg member
in the linked list of struct msg message headers.
msg_last
This is a pointer to the last struct msg member in
the linked list of struct msg message headers.
msg_cbytes
This is the total number of bytes currently on the
queue. It represents the accumulated total of all
the values of the struct msg member msg_ts in the
linked list of outstanding message headers for
that particular queue.
msg_qnum
This is how many messages are outstanding on the
queue; and thus how many struct msg message
headers are linked to this queue header.
msg_qbytes
This is an upper limit of how many bytes can be
outstanding on the queue. This is an arbitrary
value, which is set from the value of the tunable
parameter MSGMNB. This means that raising or
lowering this value does not alter how many total
messages you can have handled by the driver. That
is determined by the size of the message pool. This
value can be altered by either the owner/creator of
the message queue, or the super-user, without having
to change the value of MSGMNB.
msg_lspid
The process id of the last process to send a
message.
msg_lrpid
The process id of the last process to receive a
message.
msg_stime
The last time a message was put on this queue.
msg_rtime
The last time a message was taken off this queue.
msg_ctime
The last time anything at all happened regarding
this header.
The message pool
There is no formal name for the pool in the msg.h
structure. The message pool is an amorphous blob of
memory. It is obtained by a call to the kernel function
kseg() which returns the base address of a segment of kernel
memory. This address is cast to type paddr_t (physical
address type) which on the 3B2 line is a long. It is
treated as a contiguous array of single bytes. Think of it
as a char array.
This size of this blob is determined by multiplying the
values in the two tunable parameters MSGSEG and MSGSSZ.
The result of the multiplication is then rounded up to
the nearest page size. This size is passed as a
parameter to kseg().
The argument to kseg() is a request for pages of
memory, in the range 1 - 64. 64 pages (128K) is the
upper limit of memory that will be returned by kseg().
This is why the tunable parameters guide mentioned
above says:
"MSGSSZ * MSGSEG should be less than 131,072 bytes (128 kilobytes)."
SENDING A MESSAGE
Before diving in to how the driver handles messages, it
might be best to present a general picture of how
outstanding messages are stored.
Recall from part 1 that there are four data structures
involved in the process: struct msgbuf, struct msg, struct
msqid_ds, and the memory pool. Briefly, the msqid_ds
structure points to the first instance of a message header,
which is a msg structure. Each message header points to the
next message header in the linked list of outstanding
messages. Each message header contains the offset in the
memory pool where the actual message is being stored.
The memory pool is logically divided into segments (of size
MSGSSZ), and total number of these segments is of size
MSGSEG. When a message is actually stored, it will occupy
as much space as necessary, rounded up to the nearest
segment size. From this it can be seen that unless the
message size aligns with the MSGSSZ segment size, there will
be some wasted bytes associated with each stored message.
The enclosed diagram Mapping a Message Queue ID to a 28 Byte
Message in Kernal Memory displays the association of all
these data structures in the job of holding a 28 byte
message in memory.
**** SEE FOLLOWING POSTING FOR PIC FILE ****
3.1 Conversion of a user supplied message queue ID (msqid)
to a pointer to a struct msqid_ds queue header
This is a fundamental algorighm in the whole process, and is
called from many places in the driver;
algorithm 'msgconv'
convert msqid to msqid_ds structure
{
1. pointer = address of array,
offset by msqid
modulo MSGMNI
2. lock the reference to the structure
3. if(the structure is not in use) ||
the sequence number doesn't equal
the msqid divided by MSGMNI)
return EINVAL
4. return the pointer found in step 1.
}
step 1. convert msqid to pointer
Remember that the msqid_ds structures are held in an
array of size MSGMNI. Assuming that the call msgget()
works correctly (it does), the msqid that is returned
from that call will always be the result of an
incrementing sequence number (remember struct
ipc_perm.ushort seq?) times the value of MSGMNI, plus
the offset into the msqid_ds array of the assigned
message header. Thus if MSGMNI is 100, the sequence
number 3, and the offset 30, the message queue id
returned by msgget() would be 330.
The line that converts the msqid back to the offset is
simply:
qp = &msgque[id % msginfo.msgmni];
Which is to say that the offset of the array (here
msgque) is msqid modulo MSGMNI. In our example, this
would be 30, which is indeed the offset of the array.
step 2. Lock the reference to the structure
A parallel char array, of size MSGMNI is kept. Once
the offset is found, it is used to find the associated
lock value in this lock array. As long as this value is
1, the process sleeps (lines 1 and two following).
1 while (*lockp)
2 sleep(lockp, PMSG);
3 *lockp = 1;
When another process, which is using this message
header, is done, it sets the value of this lock to 0,
and issues a wakeup(). Our process then wakes up and
now finds the value to be 0. It then stops going to
sleep, and locks the value (line 3).
This is what allows message queues to act "atomicly"
step 3. check the msqid value for validity
The value of qp->msg_perm.seq is checked against the
value of msqid divided by MSGMNI and if they don't
match, then errno is set to EINVAL and the system call
returns with an error (-1). In our example, if the
msqid is 330, then 330/100 (in integer arithmetic)
yields 3, which is indeed the sequence number. Thus
the msqid is successfully converted to a valid and
active message header.
step 4. return the queue pointer
3.2 Sending_a_message
Sending a message consists of receiving a buffer from the
user, and putting it into the message pool, with proper
labeling so that a receive request can copy that message out
of the message pool.
algorithm 'msgsnd'
send a message
{
1. convert msqid to msqid_ds pointer (algorighm msgconv)
2. if(access denied by incorrect permissions)
return EACCES
3. if(byte count <= 0 || byte count > MSGMAX)
return EINVAL
4. copy the message type from the user area
return EFAULT on error
5. if(message type <= 0)
return EINVAL
GETRES:
6. if(queue has been removed or changed)
return EIDRM
7. if( (total bytes in queue > MSGMNB) ||
(no free msg headers available [MSGTQL])
{
8. if(IPC_NOWAIT set)
return EAGAIN
9. sleep
10. if(sleep was interrupted)
return EINTR
11. goto GETRES
}
12. call 'malloc()' to find free slot in
message pool.
13. if(no free space in message pool)
{
if(IPC_NOWAIT set)
return EAGAIN
sleep
if(sleep was interrupted)
return EINTR
goto GETRES
}
14. assuming all is OK, copy from user to
message pool.
15. if(system error during copy)
{
call 'mfree' to mark slot as free
return EFAULT
}
16. update 'msqid_ds' header
17. initialize 'msg' header
18. link 'msg' header into chain of
related 'msg' headers.
19. return 0
}
*****************
At this point I was interrupted by real work, and never returned to the
writeup. The "Bach Book" has a good writeup on this stuff.
A few points however:
MSGMAX is not MSGMNB. MSGMAX is the largest message you can send. You
can't find out the value of MSGMAX from the msqid_ds structure.
MSGMNB can be found out from "msg_qbytes". Recall from the above, that
you can reset this to a higher value if you are super-user
and always to a lower value if you are the owner.
What is important to note however is that this number has nothing to
do with capacity of the driver. It is just a number that is compared against.
MSGMNB is stored in "struct msginfo" as an INT, but when the driver is
initialized, it is transfered to "struct msqid_ds" in member "msg_qbytes"
which is a "ushort". Thus while some documentation may say that you
can have a high message queue maximum, you can only have the upper
limits of a ushort (65535). If you set MSGMNB to a higher value
than this, it will wrap, and you will wind up with a lower value.
If you want to increase the size of the message pool, increase
MSGSEG. NEVER INCREASE MSGSSZ. Recall from above:
"The memory pool is logically divided into segments (of size
MSGSSZ), and total number of these segments is of size
MSGSEG. When a message is actually stored, it will occupy
as much space as necessary, rounded up to the nearest
segment size. From this it can be seen that unless the
message size aligns with the MSGSSZ segment size, there will
be some wasted bytes associated with each stored message."
This means that if MSGSSZ is 50 bytes, and you have a 10 byte message,
you will occupy one slot, and have 40 bytes of wasted storage hanging
around. But if you keep the value of 8, then you will occupy 2 slots
and have 6 bytes of wasted storage hanging around.
See the following posting for a pic file of how this storage works.
In the msgsnd description above is:
12. call 'malloc()' to find free slot in
message pool.
13. if(no free space in message pool)
{
if(IPC_NOWAIT set)
return EAGAIN
sleep
if(sleep was interrupted)
return EINTR
goto GETRES
}
Notice that there no escape from this if the size of the message you are
trying to send is greater then the total amount of memory in the message
pool. Thus you can hang forever waiting for enough room to become
available. This will be allowed if MSGMNB is set to be greater than
( MSGSSZ * MSGSEG * (sizeof page on your machine (2048 on 3B's) ) ).
Be careful when setting the tunables!!!
When sending messages, the third parameter must be the size of
the actual message, not the size of the 'struct msgbuf', which
will be sizeof(long) bytes bigger. If you don't pay attention
to this, you will get message trashing. I can't recall now how,
when, or why this happens, but I guarantee that it will be mysterious,
and usually fatal.
struct msgbuf {
long mtype; /* message type */
char mtext[1]; /* message text */
};
So the safe way to use 'msgsnd' is:
typedef struct {
long mtype;
struct {
char buf[10];
int xxx;
double yyy;
etc...
} mtext;
} MSG;
MSG message;
1. msgsnd( msqid, &message, sizeof(message.mtext), 0);
OR
2. msgsnd( msqid, &message, sizeof(message)-sizeof(long), 0);
I prefer #1.
I also prefer to send message with the fourth parameter as 0. This will
make the message block if there is no room at the time. It is more normal
for the message to block in a heavily loaded system then not, so you
will probably NOT want to die if you can't send the message because
there's no room. If you are worried about deadlock, put in an
alarm call so you can time out.
When receiving messages, check for EINTR if you get a -1.
You will usually want to just wrap around and try again if you get
an interrupt (usually from an alarm call, or some other friendly signal).
Hope this (long-winded) yakking helps somebody.
GB
More information about the Comp.unix.wizards
mailing list