File names from file descriptors and Checkpoints (Summary Long)
Robert Side
rside at uvicctr.UUCP
Sat Sep 10 06:31:14 AEST 1988
Here is finally the summary on how to get a file name from
the file descriptor.
First, I would like to thank all the people that responded
to my problem on checkpointing processes as well as how to get file
name from a file descriptor. I tried to respond to all people
who sent me mail and I think I was more successful this time,
but if a reply did not reach you, let me now say *thank you* for your
reply.
Second, I would like to thanks two people Dave Curry and der Mouse
for sending me source to their solutions to my problem. As an
aside I will *NOT* send their source to anyone since I do not
have permission from them to do so. If you feel you need the source
I suggest you mail to these two people directly.
Finally, in short I believe the problem of checkpointing a process with
open files has been solved. At least to my satisfaction. The specific
question of an easy way of finding the file name from a file descriptor
is not solved. There may not even be a solution to this problem
as is discussed below.
-----------------
From: Amos Shapir <taux01!taux02.taux01.UUCP!amos at nsc>
>Summary: it's impossible. All a process has is a file descriptor, which
>may be connected to a pipe (and in modern systems, to a socket whose
>other end is in Timboktu). Even if it is a regular file, it may have
>been inherited from a great-grandparent, so changing fopen to keep track
>of file names is not sufficient.
>--
> Amos Shapir amos at nsc.com
>National Semiconductor (Israel)
>6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52 522261
>34 48 E / 32 10 N (My other cpu is a NS32532)
-----------------
From: uunet!dalsqnt!vector!chip (Chip Rosenthal)
>Not easily.
>
>You could do it by calling fstat() with the filedes, which will give you
>the inode of the file and device it resides on. Then you have to search
>through that device for all directory entries which reference this inode.
>This is what the SysV ncheck(1) does -- or at least it's what the XENIX
>V ncheck(C) does. In both cases you need superuser privileges. Also,
>this is not real clean -- possible problems: the filedes is a pipe, the
>file contains multiple links, the file has been rm'ed by another process,
>etc.
>---
>Chip Rosenthal chip at vector.UUCP | I've been a wizard since my childhood.
>Dallas Semiconductor 214-450-0486 | And I've earned some respect for my art.
-----------------
[ I lost the first message received by Dave Curry, (shame on me),
however I will try to state approximately what he said ]
From: davy at relay.ubc.ca (Dave Curry) (Message 1)
> [I (Dave Curry) have written a set of library routines that will]
> [checkpoint and recover processes. They where written on a VAX for]
> [BSD 4.2. I do not remember if they handle sockets, but they do]
> [handle open files and pipes. If you like I can mail you a copy.]
> [The only request that I make is that if you use my code that you]
> [send me the diffs]
>
[ I (Rob speaking now) sent Dave mail asking him if he could dig up the source
and send it to me and his next response (along with a transcript
of my message) follows ]
From: davy at relay.ubc.ca (Dave Curry) (Message 2)
> [ This is Rob speaking in the indented stuff ]
>
> [ Some stuff deleted ]
>
> I would like to take a look at the code, from what you have said
> it is pretty close to meeting my specs. There are a few things
> I am worried about. There will be open sockets. I guess I never
> said this in the article but when a rollback occurs it must
> overwrite the current memory image to keep the same processes id
>
> [ Some Stuff deleted ]
>
>Keeping the same pid is easy enough, I guess. The library writes the
>executable to the file "chkpt.dat" (user-settable), so assuming you
>have a process with the correct pid running, all you need to do is
>execl() "chkpt.dat", and you're all set.
>
>I'm still not sure how you'd go about creating sockets. It's easy enough
>to "repoen" them I guess, and you could probably even save all the connect
>info and reconnect them to their servers. But unless your servers and
>clients are all stateless, you're going to have a hard time putting the
>whole mess back into the same state.
>
> It sounds that your library can modified to meet my needs. I
> have written routines to checkpoint and rollback processes that
> do not have open files, so if I could see how you restore the
> files this would be a great help.
>
> It would be *much* appreciated if you could dig up the code and
> mail it to me. If I make any changes I CERTAINLY will mail you
> the diffs or the complete source of the changes if it is deemed
> necessary.
>
>I'll probably have to pull it off tape. I'll see if I can get to it
>today or tomorrow, if at all possible.
>
> [ My signature Deleted ]
>
>--Dave
>
[ In Dave's last correspondence I received his code and low and behold
it also handles sockets (almost) ]
From: davy at relay.ubc.ca (Dave Curry) [ Message 3 ]
>Here it comes... I looked through it, and it seems that it already does
>catch some of the socket system calls (the ones that allocate file
>descriptors), but there's also code that checks to see if the
>descriptor is a socket in chkpt.c and restore.c, so you'll need to fix
>that. Also check the two #ifdef vax sections, which will require a
>few lines of assembler if you're not on a Vax.
>
>Finally, check the Makefile - it probably doesn't install things where
>you will be wanting them...
>
>--Dave
>
> [ The actual code is deleted ]
[ I beleive Dave's code will work and I was in the process of getting
it compiled when our Suns went down. They will be up this weekend
I hope and early next week I should be able to test it ]
-----------------
From: uunet!hao.UCAR.EDU!pag (Peter Gross)
>One problem: file descriptors do not always refer to files. Depending
>on which version of Unix you are running, they could be pipes, sockets,
>fifo's, etc. Thus your solution of redoing the stdio lib to trap
>file names would leave some holes.
>
>--peter gross
-----------------
From: alberta!edm!steve
>stat(2) gives both an inode and a device #. I'm not exactly sure about the
>mapping from device # to device name/map point but, as a worst case, you could
>always fstat /<mountpoint>/. for each mounted device and then stop when you
>get a correct value.
>
> One point: from an inode #, the best that I can figure out what to get is
>A file name. If a file has multiple links, then you can sometimes find
>multiple names for the file but, in most cases, this should not be a problem
>for you.
>
>btw: the way ncheck (probably) gets file nams from inode #s is to fstat every
>file in the apropriate mounted filesystem. To speed things up, it might be
>worthwile to assume that most of the files are in (or below) the current
>directory, and start by spanning that tree before you go thru the rest of the
>file system.
> Sorry for being so verbose.
>-------------
>Stephen Samuel (userzxcv at ualtamts.bitnet or alberta!edm!steve)
>MS-DOS : CPM impersonating UNIX ** OS/2 : IBM impersonating APPLE
>
-----------------
From: uunet!gatech.gatech.edu!emory!vss (V.S.Sunderam)
> I just read your recent postings regarding checkpointing & wanted
> to let you know of our attempts in this regard. Our main
> interest is process migration, but checkpoint restarts are a
> special case & we do have some software that does this for Sun's.
> However, we do not (yet) handle processes that use sockets; the
> only other limitation is that the process use only NFS files.
>
> The Winter 88 Usenix proceedings (pp 357) has our paper that
> describes the mechanisms & the software. If you are interested
> I would be happy to give you more info and/or source code.
>
> V.S.Sunderam
> Dept.of Math & CS
> Emory University
> Atlanta, GA 30322
> vss at mathcs.emory.edu
> ...!gatech!emory!vss
-----------------
From: der Mouse <mcgill-vision!uunet!Larry.McRCIM.McGill.EDU!mouse> [Message 1]
>I implemented something similar once. What I did was to checkpoint a
>process into a file for later resumption, but the constraints were
>somewhat different. In particular, the whole point was to be able to
>restore a simulatior run after a crash, which makes restoring open
>files and so on effectively impossible. This is the difficult part of
>this: open files. My "solution" was to force the program to close all
>files before checkpointing; this was feasible in our case.
>
>Have you considered forking and letting one process run on, with the
>"resumption" consisting of switching to the other process? Depending
>on what you want, this might be good enough.
>
>Doing this would involve just adding two syscalls, one to dump a
>process and one to restore it. Yes, it's possible. I wouldn't attempt
>it without kernel source, but then I get very dogmatic about having
>source. I'd be glad to send you the code I have for dumping and
>restoring later, in another process, though it won't be directly useful.
>
> der Mouse
>
> old: mcgill-vision!mouse
> new: mouse at larry.mcrcim.mcgill.edu
[ I wrote the >> parts ]
From: der Mouse <mcgill-vision!uunet!Larry.McRCIM.McGill.EDU!mouse>[Message 2]
>> 1) If it is not too much trouble could you please send the code. I
>> have implemented two routines to save and restore a process and it
>> does seem to work on small test programs and these program must
>> not have open files. I am currently working on the problem with
>> open files.
>
>> 3) One of the limitations thrust upon me is NO KERNEL CHANGES
>
>I will be astonished if you get it to work with no kernel changes,
>unless you always use OMAGIC executables, and even then I would expect
>it to be quite a can of worms.
>
>My code consists of two syscalls, one to dump a process and the other
>to restore it. The kernel code is in the following shar as snapshot.c;
>the only other tricky part is that the user-level code surrounding the
>snapshot syscall is special. Everything but the stack pointer is saved
>on the stack to make life easier for the kernel. This code follows
>after the shar.
>
>The kernel code here is for a mtXinu 4.3+NFS system; for real 4.3 all
>that needs changing is to scrap the silly vnode code and put back the
>real inode stuff.
>
[ Actual Code Delete ]
>
>Since you are forbidden kernel changes, this probably won't be much use
>to you. If you'd like to talk about this some more, feel free to send
>me mail.
>
>der Mouse
>
>old: mcgill-vision!mouse
>new: mouse at larry.mcrcim.mcgill.edu
--
Robert Side <rside at uvunix.uvic.cdn>
UUCP: ...!{ubc-vision,uw-beaver,ssc-vax}!uvicctr!rside
BITNET: rside at uvunix.bitnet
More information about the Comp.unix.wizards
mailing list