UNIX Data Manipulation Utilities

Martin A Miller martin at csd4.csd.uwm.edu
Thu Jul 19 00:12:23 AEST 1990


Greetings..

Our site (University of Wisconsin-Milwaukee) has recently purchased a
Convex 220 (BSD 4.3) to replace our ancient UNISYS 1100/81 (Exec-8).  
There is some concern here that some of the [valuable] utilities on the
1100 might not have UNIX analogues.  One particular utility which
has proved to be indispensable is called the "Unified Data Handler"
The following is a list of some of the functions of this software:

1)  Append files
2)  Combine adjacent records based on either every n records, or a break
    in a particular variable's value, or type of record.  That is, combine
    two 128 character lines into one 256 line based on any of the above
    options, for example.
3)  Convert tapes (EBCDIC, packed decimal, etc.) into ASCII files.
4)  Create new variables, based on mathematical and/or conditional statements.
    (ie., recode)
5)  Match records from two files based on multiple keys.  The capability
    to include or exclude multiple records with the same keys from the two
    files; to do this differently for each of the two files; to output the
    matched records from both files separately and output the unmatched
    records from both separately and/or output a merged file.
6)  Merge files based on multiple keys
7)  Print files, regardless of record length and character type (e.g.,
    binary, packed decimal)
8)  Redescribe data, if necessary, to string manipulation procedures.
9)  Reformat data.
10) Select records based on multiple keys
11) Sequence checking of data; the capability to output the highest or
    lowest record with the same value on a key variable.
12) Sort data based on multiple keys, alpha or numeric, in ascending or
    descending.
13) Update records in one file based on new and/or additional data in
    another file.  The ability to change existing data to new values, and
    to add new data to records with the same key fields.

Note: the above procedures have been performed on data sets of more than
      4,000,000 records with substantial record length (500 characters per
      record, for example) using UDH.


I realize that there *are* UNIX utilities etc., which will perform the
above data manipulation routines, but I am not aware of an integrated
package (perhaps even third party software) to do *all* these things. I 
am also aware of the formidable capabilities of sed, or awk to manipulate
data, but it may require a considerable investment for previously non-UNIX
personnel to write sed or awk scripts.  Are there any data handling packages
which might fill the bill?

Please email me in reply - if the mail doesn't get through, please follow-up
to comp.unix.questions.


thank you,

-mm


Martin A. Miller
Programmer/Consultant
Social Science Research Facility
University of Wisconsin-Milwaukee
Internet: martin at csd4.csd.uwm.edu
Bitnet  : martin%csd4.csd.uwm.edu at INTERBIT
UUCP    : uunet!martin at csd4.csd.uwm.edu



More information about the Comp.unix.questions mailing list