v16i085: mtf - Map tar filenames, Part01/02
Richard L. Goerwitz
goer at midway.uchicago.edu
Tue Jan 29 12:21:43 AEST 1991
Submitted-by: goer at midway.uchicago.edu (Richard L. Goerwitz)
Posting-number: Volume 16, Issue 85
Archive-name: mtf/part01
Tar archives often come packed with filenames longer than 15 chars,
and with source code that requires that the filenames be fully pre-
served. This utility, mtf, runs through the tar headers, finds all
overlong filenames, renames them, renames them in any text files it
finds, and then rewrites the tar header checksums.
-Richard
---- Cut Here and feed the following to sh ----
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 01/20/1991 23:34 UTC by goer at sophist.uchicago.edu
# Source directory /u/richard/Mtf
#
# existing files will NOT be overwritten unless -c is specified
# This format requires very little intelligence at unshar time.
# "if test", "cat", "rm", "echo", "true", and "sed" may be needed.
#
# This is part 1 of a multipart archive
# do not concatenate these parts, unpack them in order with /bin/sh
#
# This shar contains:
# length mode name
# ------ ---------- ------------------------------------------
# 16721 -r--r--r-- mtf.icn
# 3341 -rw-r--r-- README
# 659 -rw-r--r-- Makefile.dist
#
if test -r _shar_seq_.tmp; then
echo 'Must unpack archives in sequence!'
echo Please unpack part `cat _shar_seq_.tmp` next
exit 1
fi
# ============= mtf.icn ==============
if test -f 'mtf.icn' -a X"$1" != X"-c"; then
echo 'x - skipping mtf.icn (File already exists)'
rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting mtf.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'mtf.icn' &&
X#############################################################################
X#
X# NAME: mtf3.icn
X#
X# TITLE: map tar file
X#
X# AUTHOR: Richard Goerwitz
X#
X# VERSION: 3.3
X#
X#############################################################################
X#
X# This and future versions of mtf are hereby placed in the public domain -RLG
X#
X#############################################################################
X#
X# PURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars.
X# Handles both header blocks and the archive itself. Mtf is intended
X# to facilitate installation of tar'd archives on systems subject to
X# the System V 14-character filename limit.
X#
X# USAGE: mtf inputfile [-r reportfile] [-e .extensions] [-x exceptions]
X#
X# "Inputfile" is a tar archive. "Reportfile" is file containing a
X# list of files already mapped by mtf in a previous run (used to
X# avoid clashes with filenames in use outside the current archive).
X# The -e switch precedes a list of filename .extensions which mtf is
X# supposed to leave unscathed by the mapping process
X# (single-character extensions such as .c and .o are automatically
X# preserved; -e allows the user to specify additional extensions,
X# such as .pxl, .cpi, and .icn). The final switch, -x, precedes a
X# list of strings which should not be mapped at all. Use this switch
X# if, say, you have a C file with a structure.field combination such
X# as "thisisveryverybig.hashptr" in an archive that contains a file
X# called "thisisveryverybig.h," and you want to avoid mapping that
X# portion of the struct name which matches the name of the overlong
X# file (to wit, "mtf inputfile -x thisisveryverybig.hashptr"). To
X# prevent mapping of any string (including overlong filenames) begin-
X# ning, say, with "thisisvery," use "mtf inputfile -x thisisvery."
X# Be careful with this option, or you might end up defeating the
X# whole point of using mtf in the first place.
X#
X# OUTPUT FORMAT: Mtf writes a mapped tar archive to the stdout.
X# When finished, it leaves a file called "map.report" in the current
X# directory which records what filenames were mapped and how. Rename
X# and save this file, and use it as the "reportfile" argument to any
X# subsequent runs of mtf in this same directory. Even if you don't
X# plan to run mtf again, this file should still be examined, just to
X# be sure that the new filenames are acceptable, and to see if
X# perhaps additional .extensions and/or exceptions should be
X# specified.
X#
X# BUGS: Mtf only maps filenames found in the main tar headers.
X# Because of this, mtf cannot accept nested tar archives. If you try
X# to map a tar archive within a tar file, mtf will abort with a nasty
X# message about screwing up your files. Please note that, unless you
X# give mtf a "reportfile" to consider, it knows nothing about files
X# existing outside the archive. Hence, if an input archive refers to
X# an overlong filename in another archive, mtf naturally will not
X# know to shorten it. Mtf will, in fact, have no way of knowing that
X# it is a filename, and not, say, an identifier in a C program.
X# Final word of caution: Try not to use mtf on binaries. It cannot
X# possibly preserve the correct format and alignment of strings in an
X# executable. Same goes for compressed files. Mtf can't map
X# filenames that it can't read!
X#
X####################################################################
X
X
Xglobal filenametbl, chunkset, short_chunkset # see procedure mappiece(s)
Xglobal extensions, no_nos # ditto
X
Xrecord hblock(name,junk,size,mtime,chksum, # tar header struct;
X linkflag,linkname,therest) # see readtarhdr(s)
X
X
Xprocedure main(a)
X
X usage := "usage: mtf inputfile [-r reportfile] " ||
X "[-e .extensions] [-x exceptions]"
X
X *a = 0 & stop(usage)
X
X intext := open_input_file(a[1]) & pop(a)
X
X i := 0
X extensions := []; no_nos := []
X while (i +:= 1) <= *a do {
X case a[i] of {
X "-r" : readin_old_map_report(a[i+:=1])
X "-e" : current_list := extensions
X "-x" : current_list := no_nos
X default : put(current_list,a[i])
X }
X }
X
X every !extensions ?:= (=".", tab(0))
X
X # Run through all the headers in the input file, filling
X # (global) filenametbl with the names of overlong files;
X # make_table_of_filenames fails if there are no such files.
X make_table_of_filenames(intext) | {
X write(&errout,"mtf: no overlong path names to map")
X a[1] ? (tab(find(".tar")+4), pos(0)) |
X write(&errout,"(Is ",a[1]," even a tar archive?)")
X exit(1)
X }
X
X # Now that a table of overlong filenames exists, go back
X # through the text, remapping all occurrences of these names
X # to new, 14-char values; also, reset header checksums, and
X # reformat text into correctly padded 512-byte blocks. Ter-
X # minate output with 512 nulls.
X seek(intext,1)
X every writes(output_mapped_headers_and_texts(intext))
X
X close(intext)
X write_report() # Record mapped file and dir names for future ref.
X exit(0)
X
Xend
X
X
X
Xprocedure open_input_file(s)
X intext := open("" ~== s,"r") |
X stop("mtf: can't open ",s)
X find("UNIX",&features) |
X stop("mtf: I'm not tested on non-Unix systems.")
X s[-2:0] == ".Z" &
X stop("mtf: sorry, can't accept compressed files")
X return intext
Xend
X
X
X
Xprocedure readin_old_map_report(s)
X
X initial {
X filenametbl := table()
X chunkset := set()
X short_chunkset := set()
X }
X
X mapfile := open_input_file(s)
X while line := read(mapfile) do {
X line ? {
X if chunk := tab(many(~' \t')) & tab(upto(~' \t')) &
X lchunk := move(14) & pos(0) then {
X filenametbl[chunk] := lchunk
X insert(chunkset,chunk)
X insert(short_chunkset,chunk[1:16])
X }
X if /chunk | /lchunk
X then stop("mtf: report file, ",s," seems mangled.")
X }
X }
X
Xend
X
X
X
Xprocedure make_table_of_filenames(intext)
X
X local header # chunkset is global
X
X # search headers for overlong filenames; for now
X # ignore everything else
X while header := readtarhdr(reads(intext,512)) do {
X # tab upto the next header block
X tab_nxt_hdr(intext,trim_str(header.size),1)
X # record overlong filenames in several global tables, sets
X fixpath(trim_str(header.name))
X }
X *\chunkset ~= 0 | fail
X return &null
X
Xend
X
X
X
Xprocedure output_mapped_headers_and_texts(intext)
X
X # Remember that filenametbl, chunkset, and short_chunkset
X # (which are used by various procedures below) are global.
X local header, newtext, full_block, block, lastblock
X
X # Read in headers, one at a time.
X while header := readtarhdr(reads(intext,512)) do {
X
X # Replace overlong filenames with shorter ones, according to
X # the conversions specified in the global hash table filenametbl
X # (which were generated by fixpath() on the first pass).
X header.name := left(map_filenams(header.name),100,"\x00")
X header.linkname := left(map_filenams(header.linkname),100,"\x00")
X
X # Use header.size field to determine the size of the subsequent text.
X # Read in the text as one string. Map overlong filenames found in it
X # to shorter names as specified in the global hash table filenamtbl.
X newtext := map_filenams(tab_nxt_hdr(intext,trim_str(header.size)))
X
X # Now, find the length of newtext, and insert it into the size field.
X header.size := right(exbase10(*newtext,8) || " ",12," ")
X
X # Calculate the checksum of the newly retouched header.
X header.chksum := right(exbase10(get_checksum(header),8)||"\x00 ",8," ")
X
X # Finally, join all the header fields into a new block and write it out
X full_block := ""; every full_block ||:= !header
X suspend left(full_block,512,"\x00")
X
X # Now we're ready to write out the text, padding the final block
X # out to an even 512 bytes if necessary; the next header must start
X # right at the beginning of a 512-byte block.
X newtext ? {
X while block := move(512)
X do suspend block
X pos(0) & next
X lastblock := left(tab(0),512,"\x00")
X suspend lastblock
X }
X }
X # Write out a final null-filled block. Some tar programs will write
X # out 1024 nulls at the end. Dunno why.
X return repl("\x00",512)
X
Xend
X
X
X
Xprocedure trim_str(s)
X
X # Knock out spaces, nulls from those crazy tar header
X # block fields (some of which end in a space and a null,
X # some just a space, and some just a null [anyone know
X # why?]).
X return s ? {
X (tab(many(' ')) | &null) &
X trim(tab(find("\x00")|0))
X } \ 1
X
Xend
X
X
X
Xprocedure tab_nxt_hdr(f,size_str,firstpass)
X
X # Tab upto the next header block. Return the bypassed text
X # as a string if not the first pass.
X
X local hs, next_header_offset
X
X hs := integer("8r" || size_str)
X next_header_offset := (hs / 512) * 512
X hs % 512 ~= 0 & next_header_offset +:= 512
X if 0 = next_header_offset then return ""
X else {
X # if this is pass no. 1 don't bother returning a value; we're
X # just collecting long filenames;
X if \firstpass then {
X seek(f,where(f)+next_header_offset)
X return
X }
X else {
X return reads(f,next_header_offset)[1:hs+1] |
X stop("mtf: error reading in ",
X string(next_header_offset)," bytes.")
X }
X }
X
Xend
X
X
X
Xprocedure fixpath(s)
X
X # Fixpath is a misnomer of sorts, since it is used on
X # the first pass only, and merely examines each filename
X # in a path, using the procedure mappiece to record any
X # overlong ones in the global table filenametbl and in
X # the global sets chunkset and short_chunkset; no fixing
X # is actually done here.
X
X s2 := ""
X s ? {
X while piece := tab(find("/")+1)
X do s2 ||:= mappiece(piece)
X s2 ||:= mappiece(tab(0))
X }
X return s2
X
Xend
X
X
X
Xprocedure mappiece(s)
X
X # Check s (the name of a file or dir as recorded in the tar header
X # being examined) to see if it is over 14 chars long. If so,
X # generate a unique 14-char version of the name, and store
X # both values in the global hashtable filenametbl. Also store
X # the original (overlong) file name in chunkset. Store the
X # first fifteen chars of the original file name in short_chunkset.
X # Sorry about all of the tables and sets. It actually makes for
X # a reasonably efficient program. Doing away with both sets,
X # while possible, causes a tenfold drop in execution speed!
X
X # global filenametbl, chunkset, short_chunkset, extensions
X local j, ending
X
X initial {
X /filenametbl := table()
X /chunkset := set()
X /short_chunkset := set()
X }
X
X chunk := trim(s,'/')
X if chunk ? (tab(find(".tar")+4), pos(0)) then {
X write(&errout, "mtf: Sorry, I can't let you do this.\n",
X " You've nested a tar archive within\n",
X " another tar archive, which makes it\n",
X " likely I'll f your filenames ubar.")
X exit(2)
X }
X if *chunk > 14 then {
X i := 0
X
X if /filenametbl[chunk] then {
X # if we have not seen this file, then...
X repeat {
X # ...find a new unique 14-character name for it;
X # preserve important suffixes like ".Z," ".c," etc.
X # First, check to see if the original filename (chunk)
X # ends in an important extension...
X if chunk ?
X (tab(find(".")),
X ending := move(1) || tab(match(!extensions)|any(&ascii)),
X pos(0)
X )
X # ...If so, then leave the extension alone; mess with the
X # middle part of the filename (e.g. file.with.extension.c ->
X # file.with001.c).
X then {
X j := (15 - *ending - 3)
X lchunk:= chunk[1:j] || right(string(i+:=1),3,"0") || ending
X }
X # If no important extension is present, then reformat the
X # end of the file (e.g. too.long.file.name -> too.long.fi01).
X else lchunk := chunk[1:13] || right(string(i+:=1),2,"0")
X
X # If the resulting shorter file name has already been used...
X if lchunk == !filenametbl
X # ...then go back and find another (i.e. increment i & try
X # again; else break from the repeat loop, and...
X then next else break
X }
X # ...record both the old filename (chunk) and its new,
X # mapped name (lchunk) in filenametbl. Also record the
X # mapped names in chunkset and short_chunkset.
X filenametbl[chunk] := lchunk
X insert(chunkset,chunk)
X insert(short_chunkset,chunk[1:16])
X }
X }
X
X # If the filename is overlong, return lchunk (the shortened
X # name), else return the original name (chunk). If the name,
X # as passed to the current function, contained a trailing /
X # (i.e. if s[-1]=="/"), then put the / back. This could be
X # done more elegantly.
X return (\lchunk | chunk) || ((s[-1] == "/") | "")
X
Xend
X
X
X
Xprocedure readtarhdr(s)
X
X # Read the silly tar header into a record. Note that, as was
X # complained about above, some of the fields end in a null, some
X # in a space, and some in a space and a null. The procedure
X # trim_str() may (and in fact often _is_) used to remove this
X # extra garbage.
X
X this_block := hblock()
X s ? {
X this_block.name := move(100) # <- to be looked at later
X this_block.junk := move(8+8+8) # skip the permissions, uid, etc.
X this_block.size := move(12) # <- to be looked at later
X this_block.mtime := move(12)
X this_block.chksum := move(8) # <- to be looked at later
X this_block.linkflag := move(1)
X this_block.linkname := move(100) # <- to be looked at later
X this_block.therest := tab(0)
X }
X integer(this_block.size) | fail # If it's not an integer, we've hit
X # the final (null-filled) block.
X return this_block
X
Xend
X
X
X
Xprocedure map_filenams(s)
X
X # Chunkset is global, and contains all the overlong filenames
X # found in the first pass through the input file; here the aim
X # is to map these filenames to the shortened variants as stored
X # in filenametbl (GLOBAL).
X
X local s2, tmp_chunk_tbl, tmp_lst
X static new_chunklist
X initial {
X
X # Make sure filenames are sorted, longest first. Say we
X # have a file called long_file_name_here.1 and one called
X # long_file_name_here.1a. We want to check for the longer
X # one first. Otherwise the portion of the second file which
X # matches the first file will get remapped.
X tmp_chunk_tbl := table()
X every el := !chunkset
X do insert(tmp_chunk_tbl,el,*el)
X tmp_lst := sort(tmp_chunk_tbl,4)
X new_chunklist := list()
X every put(new_chunklist,tmp_lst[*tmp_lst-1 to 1 by -2])
X
X }
X
X s2 := ""
X s ? {
X until pos(0) do {
X # first narrow the possibilities, using short_chunkset
X if member(short_chunkset,&subject[&pos:&pos+15])
X # then try to map from a long to a shorter 14-char filename
X then {
X if match(ch := !new_chunklist) & not match(!no_nos)
X then s2 ||:= filenametbl[=ch]
X else s2 ||:= move(1)
X }
X else s2 ||:= move(1)
X }
X }
X return s2
X
Xend
X
X
X# From the IPL. Thanks, Ralph -
X# Author: Ralph E. Griswold
X# Date: June 10, 1988
X# exbase10(i,j) convert base-10 integer i to base j
X# The maximum base allowed is 36.
X
Xprocedure exbase10(i,j)
X
X static digits
X local s, d, sign
X initial digits := &digits || &lcase
X if i = 0 then return 0
X if i < 0 then {
X sign := "-"
X i := -i
X }
X else sign := ""
X s := ""
X while i > 0 do {
X d := i % j
X if d > 9 then d := digits[d + 1]
X s := d || s
X i /:= j
X }
X return sign || s
X
Xend
X
X# end IPL material
X
X
Xprocedure get_checksum(r)
X
X # Calculates the new value of the checksum field for the
X # current header block. Note that the specification say
X # that, when calculating this value, the chksum field must
X # be blank-filled.
X
X sum := 0
X r.chksum := " "
X every field := !r
X do every sum +:= ord(!field)
X return sum
X
Xend
X
X
X
Xprocedure write_report()
X
X # This procedure writes out a list of filenames which were
X # remapped (because they exceeded the SysV 14-char limit),
X # and then notifies the user of the existence of this file.
X
X local outtext, stbl, i, j, mapfile_name
X
X # Get a unique name for the map.report (thereby preventing
X # us from overwriting an older one).
X mapfile_name := "map.report"; j := 1
X until not close(open(mapfile_name,"r"))
X do mapfile_name := (mapfile_name[1:11] || string(j+:=1))
X
X (outtext := open(mapfile_name,"w")) |
X open(mapfile_name := "/tmp/map.report","w") |
X stop("mtf: Can't find a place to put map.report!")
X stbl := sort(filenametbl,3)
X every i := 1 to *stbl -1 by 2 do {
X match(!no_nos,stbl[i]) |
X write(outtext,left(stbl[i],35," ")," ",stbl[i+1])
X }
X write(&errout,"\nmtf: ",mapfile_name," contains the list of changes.")
X write(&errout," Please save this list!")
X close(outtext)
X return &null
X
Xend
SHAR_EOF
true || echo 'restore of mtf.icn failed'
rm -f _shar_wnt_.tmp
fi
# ============= README ==============
if test -f 'README' -a X"$1" != X"-c"; then
echo 'x - skipping README (File already exists)'
rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting README (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'README' &&
XNAME: mtf
X
XLANGUAGE: Icon
X
XAUTHOR: Richard Goerwitz (goer at sophist.uchicago.edu)
X
XPURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars.
XHandles both header blocks and the archive itself. Mtf is intended to
Xfacilitate installation of tar'd archives on systems subject to a
X14-character filename limit.
X
XINSTALLATION: Cp Makefile.dist to Makefile and make. If all goes
Xwell, and you have root priviledges, edit the Makefile to reflect
Xyour local file structure, and make install.
X
XUSAGE: mtf inputfile [-r reportfile] [-e .extensions] [-x exceptions]
X
X"Inputfile" is a tar archive. "Reportfile" is file containing a list
Xof files already mapped by mtf in a previous run (used to avoid
Xclashes with filenames in use outside the current archive). The -e
Xswitch precedes a list of filename .extensions which mtf is supposed
SHAR_EOF
true || echo 'restore of README failed'
fi
echo 'End of part 1'
echo 'File README is continued in part 2'
echo 2 > _shar_seq_.tmp
exit 0
exit 0 # Just in case...
--
Kent Landfield INTERNET: kent at sparky.IMD.Sterling.COM
Sterling Software, IMD UUCP: uunet!sparky!kent
Phone: (402) 291-8300 FAX: (402) 291-4362
Please send comp.sources.misc-related mail to kent at uunet.uu.net.
More information about the Comp.sources.misc
mailing list