Checkpointing and the Rollback of Processes (SUMMARY)
Robert Side
rside at uvicctr.UUCP
Mon Aug 29 14:07:38 AEST 1988
First of all. I would like to *thank* all the people that responded
to my problem. I tried to reply to everyone but I guess I have
not mastered the mailing program on our system yet.
Second, I have another problem concerning this problem which I
will post in another article.
I originaly wrote on checkpointing and the rollback of processes
> I have a problem I hope somebody can help me with.
>
> Long Summary:
> I would like to be able to *checkpoint* a running process
> so that the process, which is under user control, can be rollbacked to a
> given checkpoint and restarted.
>
> My idea to solve the problem:
> The way I have been thinking to solve the problem is to save
> the process's data, stack and registers when a checkpoint
> occurs and when the user rollbacks the process, the saved
> data, stack, and registers are copied into the process's memory
> image and hopefully the process will think it is back at the time
> the checkpointed was taken.
>
> Cravats:
> Sun-3 workstations running Sun UNIX 4.2 Release 3.3. There will
> be open files as well as open sockets. The ptrace system call can
> be used.
>
> What I need help with:
> I would like to know if the problem can be solved,
> what literature (if any) has been written on the above problem,
> what problems will arise, and, MOST OF ALL, how to do it.
>
> Please email responses (But I do read these groups) and I will
> summarize.
>
> *MANY* thanks in advance and any help will be greatly appreciated.
>
> Rob Side
>
> Robert Side <rside at uvunix.uvic.cdn>
> UUCP: ...!{ubc-vision,uw-beaver,ssc-vax}!uvicctr!rside
> BITNET: rside at uvunix.bitnet
--------------------
Jeff Woolsey <uw-beaver!ames!ucbcad!nsc.NSC.COM!woolsey> writes
You've neglected one biggie: open files, and their positions. Another,
not quite so biggie: process environment (particularly the current
working directory, if the process has written out files it will later
want to read).
Of course, if the checkpoint is handled by the program itself, it can
make sure that it happens and a good time (no open files, etc). If the
checkpoint is handled by something external, so that you could use it
to checkpoint ANYTHING (except programs running with privilege), you'll
have to worry about all this stuff.
Good luck.
Jeff Woolsey woolsey at nsc.NSC.COM -or- woolsey at umn-cs.cs.umn.EDU
--------------------
uunet!jetson.UPMA.MD.US!john (John Owens) writes
Check out the undump mechanism used in GNU Emacs. It writes an
executable image of the current process. It's used to turn certain
pre-loaded data into shared read-only text, but you could adapt it to
your uses. The only problem is knowing what your open files are. If
you are able to, you could set a flag in the dumped image that your
program will read on start, and it will reopen the files, fix the
stack, and do a longjmp to a setjmp that you've stored before the
undump. You can also do an ftell on all the files during the
checkpoint and lseek during the restore....
Good luck!
---
John Owens john at jetson.UPMA.MD.US
SMART HOUSE L.P. uunet!jetson!john (old uucp)
+1 301 249 6000 john%jetson.uucp at uunet.uu.net (old internet)
--------------------
uunet!unisoft!cander (Charles Anderson) writes
I will assume that you don't care about files being changed. Rolling
them back (without just copying them) could be a problem without some
help from the O.S. Here's a simple solution that the 4.2 dump program
uses: fork and let the child do the work/transaction. If you need to
rollback, just have the child exit. The parent is then in exactly the
same state as when the "checkpoint" happened. Dump uses this to deal
with potential tape problems. You could do any number of forks (up to
the per users process limit) to maintain any number of current
checkpoints. To roll forward or "commit the transaction" you could
signal the parent(s) and have him/her/them exit. I realize it's kind
of quick and dirty and it may be expensive if the process is big, but
it will work.
Otherwise, you could try to write the whole data segment out to disk to
checkpoint and do a setjmp(). Then to rollback, you could read the
data segment back in and longjmp(). I don't know if it would work, but
it sounds good.
Let me know what you decide on. It sounds like an interesting
problem.
Charles. {sun,uunet,ucbvax,pyrmaid}!unisoft!cander
--------------------
uunet!dalsqnt!vector!chip (Chip Rosenthal) Writes
>The way I have been thinking to solve the problem is to save
>the process's data, stack and registers when a checkpoint
>occurs
Setjmp/longjmp does this for the stack and registers.
---
Chip Rosenthal chip at vector.UUCP | I've been a wizard since my childhood.
Dallas Semiconductor 214-450-0486 | And I've earned some respect for my art.
--------------------
der Mouse <mcgill-vision!uunet!Larry.McRCIM.McGill.EDU!mouse> writes
I implemented something similar once. What I did was to checkpoint a
process into a file for later resumption, but the constraints were
somewhat different. In particular, the whole point was to be able to
restore a simulatior run after a crash, which makes restoring open
files and so on effectively impossible. This is the difficult part of
this: open files. My "solution" was to force the program to close all
files before checkpointing; this was feasible in our case.
Have you considered forking and letting one process run on, with the
"resumption" consisting of switching to the other process? Depending
on what you want, this might be good enough.
Doing this would involve just adding two syscalls, one to dump a
process and one to restore it. Yes, it's possible. I wouldn't attempt
it without kernel source, but then I get very dogmatic about having
source. I'd be glad to send you the code I have for dumping and
restoring later, in another process, though it won't be directly useful.
der Mouse
old: mcgill-vision!mouse
new: mouse at larry.mcrcim.mcgill.edu
----------------
Again thanks to those that replied
Rob Side
--
Robert Side <rside at uvunix.uvic.cdn>
UUCP: ...!{ubc-vision,uw-beaver,ssc-vax}!uvicctr!rside
BITNET: rside at uvunix.bitnet
More information about the Comp.unix.questions
mailing list