Very useful binary file analyser to share
tam at cronos.metaphor.com
tam at cronos.metaphor.com
Thu Sep 20 10:06:35 AEST 1990
I had developed an utility that had been extremely useful in my last few years
as a software developer. I would like to share it with you now. However I don't
know a good way to distribute binaries (sorry, I don't want to give away
sources). I have include here a user guide and if any of you are interested,
let me know how to send it to you. I have versions on the SunOS, AIX, DOS and
OS/2 (it will run on others that I have access to), please specify which one
you want.
ANA Command summary
Prepared by: Paul C. Tam
For Version 0.15
Printed on 24 July 1990
I was rushing to finish this document, some parts may be confusing. I
appreciate any comment or enhancement of this document. This version of
ANA are free, please feel free to copy.
0 HIGHLIGHTS
* Interpret binary data in structures YOU defined.
* Rearrange data bytes before interpretation.
* Report current machine data types.
* Dump binary data in very flexible format.
* Dump multiple files in same screen.
* Same user interface across various platforms.
* Built in calculator/converter.
* Save output to disk to future use.
* Search for patterns.
* Execute Operating System command with exit utility.
* And more......
1 INTER-OPERABILITY
Inter-operational seems to be a hot buzzword these days. This software will
do just that. Since the software is extremely portable, there are versions
running on almost any operating system that has a C compiler. They have
exactly the same look and feel across all platforms.
2 Introduction
ANA is an utility program to assist users (especailly software developer)
who are interested in ANAlyzing the binary contents of any file. This program
may be easier for users who know C since the terminology used here is C like.
Its major function is primarily to display the hexadecimal contents of any
file interactively. On top of it, there are a lot of features built in to make
this utility more flexible and useful. Some of these features include: able to
dump the display buffer into a file, set the display length and base, pack the
display and search for combination of bytes (search has not yet been built).
An unique feature of this utility is perhaps its ability to analyze
certain structure. This feature is especially geared for software developers.
Sometimes data files are an array of records, each record contains information
of different types. For example, the data file maybe a control file of a print
queue. There are a number of records in there to represent the number of files
waiting to be printed. Each record in turns contains different fields, these
fields may indicate the file name to be printed, its priority and so on. They
may have data type of character (1 byte), integer (2 or 4 bytes) and ASCII
string.
Using ANA, user can create an ASCII file in which the structure is defined.
ANA then maps the data file into the structure and intreprets them as a series
of fields instead of a string of bytes.
3 How to invoke ANA
ANA can be invoked in any one of the following ways:
1) ANA
2) ANA <data_file_name>
3) ANA <data_file_name> <start_address> <length_of_buffer>
4 Inputs
Inputs can be of form hexidecimal, decimal or ascii. Numerical inputs are
interpreted according to the default base, however they can be overridden
by a prefix. Any input prefixed by 0x are always hex no matter what the
current default state is and any input prefixed with \ are always decimal.
Single ascii character must be between single quotes, ascii string,
however, must be between a pair of delimitor which can be any characters.
e.g. command s strings is the same as command s 'tring', they both search
for "tring".
5 Report
Unpacked -
0x00000000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |................|
0x00000010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................|
Packed -
000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
The report format is fairly flexible. Report address, report data can be in
either hexadecimal or decimal. The above format can vary depends on a
number of parameters, these parameters can be set by various commands. However,
the following are the default parameters unless otherwise overridden by their
corresponding commands.
Parameters Defaults Commands
Pack Mode Unpacked p (packed)
Address Base Hexadecimal b a (base)
Data Base Hexadecimal b d
Buffer size 240 bytes l (length)
Report width 16 bytes w (width)
Start address 0 a (address)
6 Command Descriptions
6.1 ? - Help
Display a brief description of commands available. This is useful for
commands review.
6.2 ENTER - Display next buffer
Data is read from file into the data buffer and displayed. Then the
next starting address is updated so that the next ENTER will display
the following data.
6.3 l - Set new buffer length
Define the size of the data buffer on the next display.
6.4 a - Set new starting address
Define a new starting address of the data file rather than the
continuation of the last display buffer.
6.5.1 b a - Toggle report address base
In the unpack mode, the address of the first byte of each report line
is shown. This address can be of base hexadecimal or decimal. This
command toggle the base.
6.5.2 b d - Toggle report data base
Data reported can be of base hexadecimal or decimal. This command
toggle the base.
6.5.3 b i - Toggle input base
All numerical inputs are interpreted on the current default base, this
command toggle the base. However, inputs prefixed with 0x are always
interpreted as hex and inputs prefixed with \ are always decimal.
6.6 c - Calculator functions
Sometimes it is necessary to do some arithmatic operations on the data
displayed. A simple set of arithmatic functions are available in ANA.
Currently, the calculator can only do integer arithmatic and is
limited to two operands and one operator (with only one exception for
conversion). The syntax of this command is the command keyword
followed by the operation followed by an ENTER. The following are
examples and descriptions of all available operations. Suppose X and
Y are two integers.
c X * Y ( X multiply Y )
c X / Y ( X divided by Y )
c X + Y ( X plus Y )
c X - Y ( X minus Y )
c X % Y ( reminder of X divided by Y )
c X & Y ( X bit and with Y )
c X | Y ( X bit or with Y )
c X ^ Y ( X bit xor with Y )
c X > Y ( X right shift Y bits )
c X < Y ( X left shift Y bits )
c X ( X can be hex, decimal or ASCII )
6.7 d - Download structure description file
Each structure description file maps only one structure, sometimes it
is desirable to map data to a different structure. This command loads
another descritpion file for the next mapping.
6.8 D - continuously dump
The whole work file starting at current location will be displayed
continuously until the end of the file.
6.9 i - information desk
This command display valuable information. Information includes the
data types in bytes of current machine, current work file name,
number, size, maximum work file allowed to open, number of work file
currently opened and the user input base, report data base and report
address base. Also the mapping alignments (read m command).
6.10 m - Map data to structure
Maps the data in the data buffer just displayed into the structure
described by the SDF. Mapping currently starts at the beginning of
the data buffer, therefore user may have to adjust the starting
address before the mapping.
Data type will normally be aligned in a structure. For example, a
'short' after a 'char' will be put in even boundary and the byte after
the 'char' is meaningless. This utility will allow user to specify its
alignment boundary. The arguments are i for int, l for long, f for
float and d for double. Their defaults values are displayed in
information desk ('i').
6.11 o - open another work file
More than one file can be worked on, this command open another work
file.
6.12 p - Toggle packed display mode
As discussed above, report format can be either packed or
unpacked, this command toggle this format.
6.13 q - Quit analyzer
Terminate and exit program.
6.14 s - search for a pattern
a pattern is searched starting at current location. The pattern can be
a series of hex or decimal number, or an ascii string in a pair of
delimiters.
6.15 t - Transfer data buffer to disk
It is possible to store the buffer just displayed into a disk file,
using this command will do just that. At the first execution of this
command, the user will be prompted for the disk file name unless it is entered with the command. Any subsequent transfer will be appended to
the named file and any file name entered in the command line will be
ignored.
6.16 u - use a different work file
If there are multiple work files opened (read o command), this command
is used to switch to a different work file.
6.17 V - Display current version
This command displays the current version of the software and
copyright message.
6.18 w - Set display row width
Especially after changing to packed display format from unpacked
format, usually it is desirable to display more data in one line. This
command allows user to adjust the display width.
6.19 z - zap old data with new data
Be care when using this command, it will replace the old data at
current location with the new data. There is no recovery from it. Data
like 's' command can be hex, decimal or ascii.
6.20 ! - OS escape
Run a regular Operating System command.
6.21 0 - Redisplay buffer
Sometimes data may be scrolled off the screen, this command will
redisplay data that was just displayed.
6.22 + - report the next display buffer
6.23 - - report the previous display buffer.
7 Structure mapping
Mapping structure is one of the unique feature in this software. Rather
than just dumping the data file in bytes, user can define a structure
definition file (hereon called SDF) from which the data can be intrepreted
in a more flexible way. The SDF is a pure ASCII file in which each line
represents one data field and the whole file together defines a structure
to be mapped.
The way to use this feature is of the following steps:
First, the SDF is created through any editor, this file must be named
"ana.fmt".
Second, display the beginning of the data structure by change the start
address and hit ENTER.
Finally, activate the mapping command to map the data buffer.
Each line in the SDF represents one data type field, every line has the
following format:
keyword user_defined_id [length/byte_arrangement]
All types except "string" the third optional field is for byte rearrange-
ment. In case of data type string, a length field has to be specified
to indicate how many bytes are in the string. The user defined name is used
to assist user to identify the field, its content is arbitrary and is
limited to 20 characters. Name more than 20 characters will be truncated.
The keywords currently supported are:
int signed interger
char single character (byte)
string string of characters
long long signed integer
short short signed integer
ulong unsigned long integer
ushort unsigned short integer
uint unsigned integer
float floating point number
double double floating point number
To allow more flexibility, it is also possible to interactively download
a new SDF so that more than one structure can be analyzed in a data file.
EXAMPLE:
Suppose there is a file of employee records, each record starts with an
employee name of 10 characters, then an employee number of type long,
followed by his salary which is of type integer. Let's further assume that
integer is two bytes and a long integer is four bytes. Instead of just
dumping the data file in bytes, it is more useful to dump them in a more
descriptive form, in this case a string, a long integer and an integer.
SDF should look like this:
string employee_name 10
long employee_no.
int employee_salary
When the ANA utility is executed, ANA detects the existance of the SDF,
it will then build the internal structure and enable the mapping facility.
User can display the portion of data that needs to be analyzed, and then
activates the mapping command, the display may look something like:
employee_name = (John Doe )
employee_no. = 999999 (0xF423F)
employee_salary = 5000 (0x1388)
Notice that the string is shown inside a pair of paranthesis. The integers
are shown in decimal and have their corresponding hexadecimal values.
User should be aware that computer always put data into their corresponding
type boundary. For example, for a machine that uses a two byte integer, a
structure like
structure {
character A;
integer 1;
}
may be stored as follow:
41 XX 00 01
or
41 00 01
In the first case, the character falls in the even boundary, since the
integer has to start also at the even boundary, there is a garbage byte in
between which does not mean anything. While in the second case, the
character falls in the odd boundary and therefore the integer can be put
right after the character.
For the same reason, sometimes it is very confusing to just look at the
data byte by byte. It is better off to use the structure mapping.
Furthermore, different CPUs have different characteristics. Some align
integer and long into even boundary while other might align integer in even
boundary and long in 4 byte boundary. This software will allow user to
customize the alignment. Command m [i/l/f/d] specify the alignment of the
data type used in structure mapping.
To make life harder, some CPU swap bytes (our beloved 80x86 architecture)
and some don't. The third optional field in the format file are just for
that. It specifies the data rearrangement sequence. For example, an integer
made of 4 bytes is stored in a file as 0x11 0x22 0x33 0x44 (0x011223344),
a line in the format file:
int sample
yields an output of
sample = 287454020 (0x11223344)
but if the format file is written as:
int sample 4321
the output will be
sample = 1144201745 (0x44332211)
This feature is really useful if for example, someone tries to dump a file
created in a 68000 machine in a 8086 machine.
More information about the Comp.unix.programmer
mailing list