SUMMARY: which is more efficient...
Paul D. Anderson
pda at stiatl.UUCP
Fri Mar 2 07:22:34 AEST 1990
Here is the original question I posed:
|Which is more efficient (ie: cheaper and faster) in a program?
| - to chdir("/path") and then unlink("file") for each file in the ...
| - to specify the entire path in each call to unlink()?
|
|I know of namei() in the kernal, it has been suggested that doing
|the chdir() first will mean namei() will not have to be invoked as
|often. But does not unlink() have to chase back up the directory
|structures (re-construct the directory tree) in order to ascertain
|if the correct permissions exist to allow removal of the file? Therefore,
|would not calling unlink() with the full pathname be more efficient,
|since all information is provided to unlink() about the path?
The responses were thorough and unique, and some even included test
source. Make sure you see the timing test results near the end of this
summary. My thanks to everyone who sent me the information!
What follows are the [edited] summaries...
--
>From emory!ulysses.att.com!ekrell Tue Feb 20 22:15:24 1990
Eduardo Krell, AT&T Bell Laboratories, Murray Hill, NJ
It's cheaper to chdir to that directory first and then unlink the
files using a relative pathname. How much cheaper will depend on
the version of Unix. It makes a bigger difference in old versions
of System V. BSD uses a namei cache and that diminishes the benefits
of doing the chdir(), but it will still run faster.
unlink() doesn't need to check the permissions of the directory
hierarchy up to the root; it only cares about the permissions
on the directories you specified in the argument. If you call
unlink() with just a simple relative file name like "foo.c",
it will check the permissions of the current working directory,
for which there's already an inode pointer in the user structure.
--
>From gatech!ames!pacbell!ctnews!unix386!dick Wed Feb 21 03:00:30 1990
Dick Hacking
Why not put all the files in their own filesystem and then
do a mkfs when you are finished with them.... that is *MUCH*
faster than trying to unlink anything. [novell solution!]
--
>From emory!cs.UMD.EDU!chris Wed Feb 21 04:40:22 1990
Chris
chdir() is more efficient.
No, unlink need chase nothing: permission to unlink files is based on
the permissions of the directory in which the files reside. If that
directory can be written (by user/group/other, whichever matches first),
files in that directory may be unlinked.
--
>From gatech!mailrus!uunet!prcrs!wrwalke Wed Feb 21 05:00:17 1990
bill.
on all systems i work on bsd4.2, 4.2, and ultrix, hp-ux, "file"
and "/full/path/to/file" are both accessed as inode #'s. the
unlink merely drops the link count to that inode. if it has no
external links, the inode is free'd up for later use. the chdir()
call merely causes another inode lookup and current process info
change. in other words, the chdir() would be a waste of time.
[i presume he means it would be a waste of time if I only
deleted one file in a directory before moving to the next.]
another note, be very careful in unlinking dirs, you are freeing
an inode of a file that still must remain to physically access
the other files under it. fsck will fix it, but it can be messy
on the screen and cause a shortage of inodes (in sysV at least),
as the numbers are not free'd.
--
>From stiatl.uucp!bgi Tue Feb 20 16:32:25 1990
Brad
Well... according to _The Design of the UNIX Operating System_ by
Maurice J. Bach, the most efficient is to chdir first. Reason being
that unlink doesn't have to reconstruct the path back to root checking
permissions. I was suspicious of this until I tested it myself.
[Brad performed the same test as Jon Kamens, see further down.]
--
>From emory!cbnewsi!npl Wed Feb 21 15:30:09 1990
Nick Landsberg, AT&T Bell Laboratories
Just based on intuition and some knowledge of the internals, I would
venture that the "chdir" approach is MUCH faster, especially if "path"
is long. If "path" is long then every element in the path must be
parsed by the kernel for every unlink call, the appropriate i-node
looked up, the next element found, etc. With the chdir() approach,
the path lookup is only done once and then only the current directory
searched for every file to be unlinked.
You've stumbled into a "security hole" in the system. The only check
done about unlinking a file (by the kernel) is if you have write permission
in the current directory, since you are changing the contents of the
current directory. (This may not be true of all flavors of Unix,
your mileage may vary.) Thus, "unlink()" does not traverse the directory
backwards to check all directories for write permission, you aren't
changing any of them. Whether this is "correct" behavior is questionable,
but the other way is inefficient.
--
>From gatech!uflorida!ki4pv!cdis-1!tanner Thu Feb 22 11:31:11 1990
If you unlink using the entire path each time, then you will have to
run through the entire set of directories (in a kernel routine most
likely called "namei") to find the file, do permission checks, & check
them each time.
If, on the other hand, you chdir() to the offending directory, you
can unlink the files by name without having to scan the path and read
each directory on the way. However, you have to remember where you
were before (and change back), or relative pathnames will stop working.
--
>From gatech!mailrus!uunet!attcan!telly!eci386!jmm Thu Feb 22 11:31:14 1990
John Macdonald
Why not use: system( "rm -rf /path/*" ) or system( "rm /path/*" )?
(The first for recursively removing sub-directories as well.)
Unless there was major foolishness on the part of the implementors
of your Unix system, this should be very close to optimal for large
numbers of files (as long as the number of files doesn't overflow
the command line: in such a case you could use:
system( "find /path -type f -print | xargs rm; rm -rf /path" ) or
system( "find /path -type f -print | sed -e '/^\/path\/.*\//d' | xargs rm" )
although the second does have to traverse the entire tree even though
it is only removing the top level files.
--
>From gatech!mit-eddie!pit-manager.MIT.EDU!jik Thu Feb 22 11:31:23 1990
jik at Athena.MIT.EDU, Office: 617-253-8495
Jonathan I. Kamens, MIT Project Athena
[doesn't unlink() have to chase back up the directory tree?]
No, unlink() doesn't have to do anything of the sort. Directory
permissions in Unix are not based on permissions of higher-up directories
in any direct manner. In other words, once you've managed to get
into a directory, you can perform operations in that directory paying
attention only to the permissions in that particular directory.
Example:
pit-manager% ls -ldg foo foo/bar
drwxrwxr-x 3 jik wheel 512 Feb 21 11:59 foo/
drwxrwxr-x 2 jik wheel 512 Feb 21 11:59 foo/bar/
pit-manager% cd foo/bar
pit-manager% chmod 000 ../../foo
pit-manager% touch frep
pit-manager% ls -ldg .
drwxrwxr-x 2 jik wheel 512 Feb 21 12:00 ./
pit-manager% ls -ldg ..
d--------- 3 jik wheel 512 Feb 21 11:59 ../
So, you see, I was able to perform operations in "bar" even after I
could no longer do so in "foo".
One of the advantages of this is that if you make a directory executable
but not readable, and then put readable+executable directories underneath
it, then people can cd into those directories and read from them
if they know the names, but not if they don't (i.e. they can't do
an "ls" on the parent directory, but they can use "cd parent/child"
if they know the name of the child.
--
>From gatech!BBN.COM!pineapple.bbn.com!bbn.com!rsalz Fri Feb 23 04:15:20 1990
Rich $alz
Once you chdir to the directory, to remove a file all you need is
write permission in the dir. The whole path all the way down only
counts in the chdir.
Now, it is possible that
cd /foo/bar/baz ; rm zap
will fail (no "r" perm in baz, I guess) while
rm /foo/bar/baz/zap
would succeed.
Doesn't happen very often.
--
>From emory!uunet.UU.NET!prcrs!wrwalke Fri Feb 23 19:40:04 1990
William Walker
boy, you did it now, we have a battle going on now as to whether the
chdir() as a system call eats more than tracing absolute pathnames.
apparently if the dirs are rather empty, less than say 5 or six files
each, the consensus is that the syscall is not efficient. if the dir
is very full, say maybe in the 20-200 file range, the overhead would
make up for the second syscall. everyone and his brother will now be
writing test code to prove a point.
on most systems only root can remove a non- empty dir, on some systems,
even remove a busy executable. if you are running as root, make damn
sure you don't blow a non-empty dir. fsck will make you wish you were
sitting by the pool sipping marguaritas. (unless you run it in "SHUT-UP"
mode, -p ??).
--
>From emory!virtech!cpcahil Fri Feb 23 20:15:08 1990
Conor P. Cahill
Technically, the first method [chdir()/unlink()] will be slightly
more efficient, since namei() (or lookupname()) will only have to
parse the "file" portion at unlink time. However, you will probably
not be able to demonstrate a difference in executable time between
the two because of namei() caching and/or buffer caching.
The real difference would come if the path to file was very long, your
system was very heavily loaded, and the directories along the path
were very large (had lots of entries).
So, for most cases, you won't be able to tell the difference.
--
>From emory!uunet.UU.NET!auspex!auspex.com!guy Tue Feb 27 18:10:14 1990
Guy Harris
> Let me clarify my question from the previous posting:
> I know of namei() in the kernal, it has been suggested that doing
> the chdir() first will mean namei() will not have to be invoked as
> often.
No. It will just mean that "namei()", in those systems that have it,
will do less work looking up the file. In those systems that don't
have it, e.g. any system with SunOS-style vnodes - including S5R4 - the
equivalent routine will also do less work looking up the file. In both
cases, the reason is that it takes more work to look up
foo/bar/bletch/mumble.c
than it does to look up
mumble.c
because in the first case you have to search "foo" for "bar", and then
"bar" for "bletch", and then "bletch" for "mumble.c", while in the
second case you only have to search the current directory for
"mumble.c". Directory name caches help here, but they don't completely
eliminate the slowdown. In addition, it takes more CPU time to 1)
construct the longer path in your program and 2) copy it into the
kernel.
[But does not unlink() have to chase back up the dir tree?...]
No. The only permission "unlink()" requires to allow removal of the
file - or, more correctly, removal of a directory entry referring to a
file - is write permission on the directory containing that entry.
"rm", if invoked without the "-f" flag, also checks whether the file is
writable, but that doesn't require any directory searching either.
In other words,
unlink("foo/bar/bletch/mumble.c")
is less efficient than
unlink("mumble.c")
>And does the recommended method used vary based on Sys V or BSD 4.2/3 ??
I don't expect so.
--
>From emory!lzga!bogatko Tue Feb 20 22:15:21 1990
GB
My vote is a chdir, followed by unlinks.
My authority is BACH - "Design of the Unix Op. Sys", section 4.4 pages
74 and 75.
Short summary -- each path resolution requires a namei resolution
from the first path component, if there are three components to the
path, then three namei resolutions are required. If there is only
one component, then there is only one resolution.
As far as I could see from the 5.2 source, unlink calls iget on the
full path if it has a full path, and avoids iget if it has ".".
Admittedly I didn't spend a lot of time looking at the stuff (it produces
headaches) but it seems to infer that chdir is the better way to go.
Incidently, Here's how I did it once. The example suffers from doing
unlinks on complicated paths, but putting in a 'chdir' shouldn't
be too difficult.
This is System 5.3 specific.
/*
*
* cleanfiles.c
*
*/
/* INITIAL CODING DATE
Sun Sep 25 15:45:45 EDT 1988 by George M. Bogatko
HEADER FILES
*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <time.h>
/*---------------- IMPORTED GLOBAL VARIABLE/FUNCTION DEF'S ----------------*/
extern char *regcmp();
extern char *regex();
/*---------------- INTERNAL GLOBAL VARIABLE/FUNCTION DEF'S ----------------*/
static char sccs_id[] = "cleanfiles.c 1 1/16/89 15:15:16";
/*----
SYNOPSIS:
clean_files(directory, pattern, age)
char *directory;
char *pattern;
long age;
DESCRIPTION:
Clean_files removes all the files in 'directory' whose names
match the regular expression 'pattern' and whose modification
times exceed 'age' seconds.
RETURN:
The function returns the number of files successfully removed.
EXAMPLE:
to remove temporary files in '/usr/tmp', whose names begin with
'ConV', and are more than 10 minutes old, call:
num_removed = clean_files("/usr/tmp", "^ConV", 600);
|
notice the caret
CAVEATS:
Accidently removed files cannot be recovered. Use this
carefully.
This function uses 'regcmp' and 'regex', thus the pattern used in
the match uses the syntax of 'ed', 'ex', and 'vi' NOT 'sh'.
'regcmp' and 'regex' are located in libPW.a (cc ... -lPW).
------- */
clean_files(directory, pattern, age)
char *directory;
char *pattern;
long age;
{
DIR *dir;
struct dirent *dir_ent;
struct stat buf;
char *cmp_pat;
time_t curtime;
int num = 0;
char tempbuf[100];
if( (dir = opendir(directory)) == (DIR *)NULL )
return -1;
curtime = time( (time_t *)NULL );
cmp_pat = regcmp(pattern, (char *)NULL);
/*
*
*
*
* YOU COULD PUT A CHDIR HERE, but be careful if you want to
* go back to where you were!
*
*
*
*/
while( (dir_ent = readdir(dir)) != NULL )
{
if( regex(cmp_pat, dir_ent->d_name) != (char *)NULL )
{
sprintf(tempbuf, "%s/%s", directory, dir_ent->d_name);
stat(tempbuf, &buf);
if( (curtime - buf.st_mtime) > age )
{
if(unlink(tempbuf) != -1)
num++;
}
}
}
closedir(dir);
free(cmp_pat);
return num;
}
--
>From emory!lzga!bogatko Wed Feb 21 10:40:13 1990
GB
Always one to never believe straight theory, I've done a test.
This program creates 500 dummy files in "/usr/tmp/1/2/3/4/5/6/7/8/9"
and then cleans them out, all from a C program.
I picked such a long path name under the assumption that the namei()
resolution would slow down the unlink process, and wanted to give
namei() a lulu of a path to chew on.
The results are in 'edres', As you can see, CHDIR is the better way
to go.
This was written and tested on a 3B600, under System 5.3.1. If you're
not on such a box, that may explain why you don't have 'opendir',
'readdir', or 'closedir'.
Hope this helps.
***** CUT HERE *****
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
# makefile
# cleanmain.c
# edres
# testrm
# This archive created: Wed Feb 21 09:36:37 1990
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'makefile'" '(309 characters)'
if test -f 'makefile'
then
echo shar: "will not over-write existing file 'makefile'"
else
sed 's/^ X//' << \SHAR_EOF > 'makefile'
XD=
X
XCFLAGS=-I. -O -Ksd $D
XLIBS=-lPW
X
Xall:
X make cleanmainC D=-DCHDIR
X rm cleanmain.o
X make cleanmainNC
X make clean
X
XcleanmainC: cleanmain.o
X cc -o cleanmainC cleanmain.o $(LIBS)
X
XcleanmainNC: cleanmain.o
X cc -o cleanmainNC cleanmain.o $(LIBS)
X
Xclean:
X rm cleanmain.o
X
Xclobber:
X rm cleanmainC
X rm cleanmainNC
SHAR_EOF
if test 309 -ne "`wc -c < 'makefile'`"
then
echo shar: "error transmitting 'makefile'" '(should have been 309 characters)'
fi
fi
echo shar: "extracting 'cleanmain.c'" '(2114 characters)'
if test -f 'cleanmain.c'
then
echo shar: "will not over-write existing file 'cleanmain.c'"
else
sed 's/^ X//' << \SHAR_EOF > 'cleanmain.c'
X#include <stdio.h>
X#include <sys/types.h>
X#include <sys/times.h>
X#include <sys/stat.h>
X#include <dirent.h>
X#include <time.h>
X
X#define DIRECTORY "/usr/tmp/1/2/3/4/5/6/7/8/9"
X#define TMP "_RmT_"
X#define PAT "^_RmT_"
X
Xextern char *regcmp();
Xextern char *regex();
X
Xmain(argc, argv)
Xint argc;
Xchar *argv[];
X{
Xstruct tms tbuf;
Xlong btime, etime;
Xchar *tnam;
Xint i;
Xint fd;
Xint num;
Xchar *dir_tory;
X
X#ifdef CHDIR
X fprintf(stderr, "CHDIR METHOD\n");
X chdir(DIRECTORY);
X dir_tory = ".";
X#else
X fprintf(stderr, "NON-CHDIR METHOD\n");
X dir_tory = DIRECTORY;
X#endif
X
X fprintf(stderr, "creating 500 files\n");
X
X btime = times(&tbuf);
X for(i=0; i< 500; i++)
X {
X tnam = tempnam(dir_tory, TMP);
X if( (fd = creat( tnam, 0666 )) == -1 )
X {
X perror("creat failure");
X exit(-1);
X }
X close(fd);
X }
X etime = times(&tbuf);
X
X fprintf(stderr, "creating took %ld clicks\n", etime-btime);
X fprintf(stderr, "usrtime - %ld\n", tbuf.tms_utime);
X fprintf(stderr, "systime - %ld\n", tbuf.tms_stime);
X fprintf(stderr, "\n\n");
X
X fprintf(stderr, "calling clean_files\n");
X
X btime = times(&tbuf);
X num = clean_files(DIRECTORY, PAT);
X etime = times(&tbuf);
X
X fprintf(stderr, "cleaned %d files\n\n",num);
X fprintf(stderr, "cleaning took %ld clicks\n", etime-btime);
X fprintf(stderr, "usrtime - %ld\n", tbuf.tms_utime);
X fprintf(stderr, "systime - %ld\n", tbuf.tms_stime);
X fprintf(stderr, "\n\n");
X
X return 0;
X}
X
X
Xclean_files(directory, pattern)
Xchar *directory;
Xchar *pattern;
X{
XDIR *dir;
Xstruct dirent *dir_ent;
Xchar *cmp_pat;
Xint num = 0;
Xchar tempbuf[100];
Xchar *dir_tory;
X
X#ifdef CHDIR
X chdir(directory);
X dir_tory = ".";
X#else
X dir_tory = directory;
X#endif
X
X if( (dir = opendir(dir_tory)) == (DIR *)NULL )
X return -1;
X cmp_pat = regcmp(pattern, (char *)NULL);
X
X while( (dir_ent = readdir(dir)) != NULL )
X {
X if( regex(cmp_pat, dir_ent->d_name) != (char *)NULL )
X {
X/*
X * we'll have to use this sprintf in the non-chdir method, so I'll keep
X * the overhead in the chdir method.
X */
X sprintf(tempbuf, "%s/%s", dir_tory, dir_ent->d_name);
X if(unlink(tempbuf) != -1)
X num++;
X }
X }
X closedir(dir);
X free(cmp_pat);
X return num;
X}
SHAR_EOF
if test 2114 -ne "`wc -c < 'cleanmain.c'`"
then
echo shar: "error transmitting 'cleanmain.c'" '(should have been 2114 characters)'
fi
fi
echo shar: "extracting 'edres'" '(637 characters)'
if test -f 'edres'
then
echo shar: "will not over-write existing file 'edres'"
else
sed 's/^ X//' << \SHAR_EOF > 'edres'
XCHDIR METHOD
X
XCREAT USR SYS CLEAN USR SYS
X
X1510 63 493 3375 88 807
X1949 49 665 3064 68 952
X1717 55 638 3069 82 978
X1813 51 653 3054 73 938
X1683 62 633 3074 71 896
X1676 61 634 3057 85 945
X1681 43 648 3083 59 958
X1736 54 643 3061 69 936
X1687 65 626 3090 80 870
X1682 58 640 3143 66 893
X
X
X******************
X
X
XNON-CHDIR METHOD
X
XCREAT USR SYS CLEAN USR SYS
X
X1774 61 1170 3591 69 1638
X1764 72 1179 3278 89 1601
X1742 77 1162 3278 103 1616
X1738 65 1181 3253 80 1609
X1785 66 1193 3283 77 1651
X1730 75 1175 3320 90 1628
X1722 81 1177 3289 104 1633
X1756 75 1173 3256 96 1606
X1750 61 1177 3391 82 1594
X1833 66 1183 3258 90 1619
SHAR_EOF
if test 637 -ne "`wc -c < 'edres'`"
then
echo shar: "error transmitting 'edres'" '(should have been 637 characters)'
fi
fi
echo shar: "extracting 'testrm'" '(182 characters)'
if test -f 'testrm'
then
echo shar: "will not over-write existing file 'testrm'"
else
sed 's/^ X//' << \SHAR_EOF > 'testrm'
X> results
Xfor i in 1 2 3 4 5 6 7 8 9 10
Xdo
X echo CHDIR round $i
X cleanmainC 2>>results
Xdone
X
Xfor i in 1 2 3 4 5 6 7 8 9 10
Xdo
X echo NON-CHDIR round $i
X cleanmainNC 2>>results
Xdone
SHAR_EOF
if test 182 -ne "`wc -c < 'testrm'`"
then
echo shar: "error transmitting 'testrm'" '(should have been 182 characters)'
fi
fi
exit 0
# End of shell archive
--
--
Paul Anderson * h:404-565-0761 w:404-841-4000
{mathcs.emory,gatech}.edu!stiatl!pda || pda at SalesTech.Com
More information about the Comp.unix.questions
mailing list