UNIX Data Manipulation Utilities
Martin A Miller
martin at csd4.csd.uwm.edu
Thu Jul 19 00:12:23 AEST 1990
Greetings..
Our site (University of Wisconsin-Milwaukee) has recently purchased a
Convex 220 (BSD 4.3) to replace our ancient UNISYS 1100/81 (Exec-8).
There is some concern here that some of the [valuable] utilities on the
1100 might not have UNIX analogues. One particular utility which
has proved to be indispensable is called the "Unified Data Handler"
The following is a list of some of the functions of this software:
1) Append files
2) Combine adjacent records based on either every n records, or a break
in a particular variable's value, or type of record. That is, combine
two 128 character lines into one 256 line based on any of the above
options, for example.
3) Convert tapes (EBCDIC, packed decimal, etc.) into ASCII files.
4) Create new variables, based on mathematical and/or conditional statements.
(ie., recode)
5) Match records from two files based on multiple keys. The capability
to include or exclude multiple records with the same keys from the two
files; to do this differently for each of the two files; to output the
matched records from both files separately and output the unmatched
records from both separately and/or output a merged file.
6) Merge files based on multiple keys
7) Print files, regardless of record length and character type (e.g.,
binary, packed decimal)
8) Redescribe data, if necessary, to string manipulation procedures.
9) Reformat data.
10) Select records based on multiple keys
11) Sequence checking of data; the capability to output the highest or
lowest record with the same value on a key variable.
12) Sort data based on multiple keys, alpha or numeric, in ascending or
descending.
13) Update records in one file based on new and/or additional data in
another file. The ability to change existing data to new values, and
to add new data to records with the same key fields.
Note: the above procedures have been performed on data sets of more than
4,000,000 records with substantial record length (500 characters per
record, for example) using UDH.
I realize that there *are* UNIX utilities etc., which will perform the
above data manipulation routines, but I am not aware of an integrated
package (perhaps even third party software) to do *all* these things. I
am also aware of the formidable capabilities of sed, or awk to manipulate
data, but it may require a considerable investment for previously non-UNIX
personnel to write sed or awk scripts. Are there any data handling packages
which might fill the bill?
Please email me in reply - if the mail doesn't get through, please follow-up
to comp.unix.questions.
thank you,
-mm
Martin A. Miller
Programmer/Consultant
Social Science Research Facility
University of Wisconsin-Milwaukee
Internet: martin at csd4.csd.uwm.edu
Bitnet : martin%csd4.csd.uwm.edu at INTERBIT
UUCP : uunet!martin at csd4.csd.uwm.edu
More information about the Comp.unix.questions
mailing list