command line options
John P. Rouillard
rouilj at umb.umb.edu
Tue Apr 5 05:55:57 AEST 1988
The folowing structure allows a generic function to parse any
cconcevable command line. The structure would have the form:
struct command_entry
{
struct command_entry * next, /* for a linked list of these babies */
char *NAME, /* the full name of the option */
char *ABBREV, /* the shortest abbreviation for the option */
char *ARG_TYPE, /* the type of argument (string, char, int, float ...)
char *format_type, /* Keyword = value, +keyword, -keyword ... */
type *VARIABLE_addr, /* the address of a variable to set */
enum v_type VAR_type, /* the type of the variable above */
int **FUNCTION_addr(), /* address of a function returning a pointer to int */
enum f_type FUNCT_type, /* the type the function actually returns */
int *Error_handler(), /* Your own personal error handler */
add you favorite options here
};
a possible entry would be: (from the command make (augmented for show )
{ NAME "file",
ABBREV "f",
ARG_TYPE string, /* ie char * */
format_type "-w" /* specifing "-"f and w signifies space between
keyword and value */
VARIABLE_addr &makefile_name,
VAR_type String,
FUNCTION_addr NULL, /* not function needed */
FUNCT_TYPE NULL /* the type of a nonexistant function */
}
This structure would allow:
a: A long name that would be able to be abbreviated to the value
in ABBREV.
b: Handling multi character flags without values (I.E. "-las" in
"ls -las") Simply loop over each character and set the
appropriate flag.
c: Whitespace elimination (I.E. -Kvalue) is easily done the value
up to the next whitespace character is scanned according to its
type.
d: The setting of a variable to an argument value or if a function
is specified the setting of the variable to the pointer value
returned by the function. (The variable at the VARIABLE_addr
is interpreted according to the value in VAR_type so
appropriate casts can be made.)
e: The ability to handle special parsing of the command line via
calls to a function that takes 1) current argv location, 2) argc
and 3) the address of the command_entry list
as arguments.
f: For those values that are multiples on the command line (i.e.
multiple filenames), the function specified in the
command_entry could create a list of the names (copying them if
desired) and then have the variable in the command_entry point
to the head of the list.
g: Optionally to setting other variables, the values could be returned
in the command_entry structure itself (maybe via a union in the
struct??).
h: The ability to specify in the command entry an error routine
specific for the particular option being parsed.
i: By adding the flexibility of calling a function to deal with
funky parts of the command line the function to parse the
command line will return only when it has parsed the whole
command line thus eliminating the problem of dealing with the
unparsed command line namely because it is an error [probably
fatal] for it not to parse the whole command line.
j: The command_entries could be created dynamically during
runtime, or declared statically at compile time.
k: The driver for Options_please (the get_ops lookalike )
would act similiarly to a LR or LL parser driver with a parse
table (the linked list of command_entries). The driver is easy to
maintain with all of the work actualy done during the creation
of the parse tables.
BUGS:
a: This data for the command_entries could take up a lot of
space and therefore may be troublesome.
b: The second problem occurs because of the ambiguity in the
command language. Please follow my description below:
Assume we have defined:
A keyword Kval that can have an optional argument,
and boolean keywords (flags either on or off) "u" and "e".
How do we parse "-Kvalue".
Is it Kval with argument "ue" or is it Kval with no
arguments and the boolean flags "u" and "e".
If we allow eliding of whitespace between flag and value
it is impossible to tell which is meant. By doing away
with 'c' above we can then parse this as Kval with no
arguments.
Another ambiguity arises if we decide on having an argument
that can be abbreviated "K" (Kval needs all four letters)
and other arguments "v", "a", and "l". Now how does the
above string parse:
The boolean "K" the boolean "v" no wait those two letters
are the prefix for Kval (ARRGH ;-[) (HELP LR GRAMMAR)
Richard Harter also touched on this ambiguity problem in his
article.
This is a problem that is inherant with features
a,b above.
One way around this is to make sure that you never use the
letters K,v,a, and l :-).
A second way around the problem is to make the order of the
keyword in the list of command_entries significant and
therefore impart an priority to the commands. In the above
example:
if Kval appeared before K (which it would have to
do in order to have Kval called at all) the
interpretation of the flag Kval would occur first.
A third way around it is to write the table such that no two
command_entries have overlaping differences.
The fourth way is to write a function that will allow the
handling of this via look-ahead or whatever mechanism you
devise. Basically you turn an NFA into a DFA by combining
states. E.G. if a K was found a function would be called that
would try to determine if the value was Kval or if the value
was K followed by random characters.
If you think this stuff was handled in The Dragon Book You are
right on the money. But note the thing that causes all of the
problems is allowing names and having possibly non-unique
representations for every string that can be generated.
However this facility seems to be the only way to even attempt
generality and allow a way of working around the problem.
PLEASE NOTE: that this is only an idea and I would like feedback on
it.
Please feel free to steal the idea and modify it as necessary.
Sorry it is so long but I was trying to reply to everybodues favorite
must haves.
What do I know I am only a Physics major?
==========================================================================
The opinions expressed above are all mine and belong to nobody else. To
U-Mass I am just a number.
E = M C**2 Not just an equation a way of life.
John Rouillard U.S. Snail: Physics Department
U-Mass Boston U-Mass Boston
Physics Major Harbor Campus
Boston, MA 02125
UUCP: harvard!umb
husc6!umb
More information about the Comp.lang.c
mailing list