upper/lower case filter

Jim Hester hester at ICSE.UCI.EDU
Tue Nov 12 11:59:43 AEST 1985


UNIX provides this facility with the 'tr' (translate characters) program.
To change everything to upper case, use

    tr A-Z a-z

I don't know what effect this has if the letters are not contiguous (as
in an IBM character code I won't name).  If that is a problem, you just
explicitly list the letters from A to Z in both upper and lower cases.

If the files are reasonably large, a more efficient algorithm (than
checking character types during input) is a table lookup scheme like
the following (which is the basic method used by tr):

    #define NCHARS 256
    int table[ NCHARS ], ch;

    for ( ch = 0 ; ch < NCHARS ; ++ch ) {
	if ( islower(ch) )
	    table[ch] = toupper(ch);
	else
	    table[ch] = ch;
    }
    while ( EOF != (ch = getchar()) ) putchar( table[ch] );

Running a few quick tests, table lookup took 3/4 of the time of
checking character types for each input character.

When alphabetic characters are contiguous (which implies a constant
difference between case of characters, which you took advantage of), as
in ASCII, the initialization loop can be sped up by elimenating the 256
calls to islower() and 26 calls to toupper().  Simply remove the first
three lines in the loop and add a new loop:

    shift = 'A'-'a';
    for ( ch = 'a' ; ch <= 'z' ; ++ch ) table[ch] += shift;

Also, if the character code uses a single bit to distinguish character
case, you can speed it up even more by just ANDing or ORing a mask to
the appropriate locations in the table:

    mask = ~('a'-'A');
    for ( ch = 'a' ; ch <= 'z' ; ++ch ) table[ch] &= mask;

One or both of these two speedups have negligable effect on the runtime
for large inputs since, being only used during a constant initialization
step, they are independant of the input size.  It's probably better to
stick with something closer to the original code I gave, for reasons of
simplicity and portability.

	Jim



More information about the Comp.sources.unix mailing list