Lower->Upper in AWK (was: Re: cascading pipes in awk)
David Huelsbeck
dph at lanl.gov
Thu May 25 04:21:23 AEST 1989
>From article <818 at manta.NOSC.MIL>, by psm at manta.NOSC.MIL (Scot Mcintosh):
>
> Unfortunately, I only want to uppercase a few selected portions of the
> text my awk program is reading (my original posting contained a
> very simplified example, so this wasn't obvious). There just doesn't
> seem to be a way to have a filter program in the middle of two groups
> of awk statements.
I afraid your right. Perhaps nawk or gawk would help you but I
really don't know enough about either one to say. However, you
can, somewhat painfully, translate lower to upper or rot13 or
whatever in plain old awk.
Here is my solution to this problem along with a summary of solutions
I recieved from other awkers when I posted asking for a better way.
Sorry for the length but I felt that every different solution showed
a unique and interesting approach that might be useful in solving other
sorts of problems in awk.
----------------------------------------------------------------------
BEGIN {
cap["a"] = "A"; cap["b"] = "B"; cap["c"] = "C"; cap["d"] = "D"
cap["e"] = "E"; cap["f"] = "F"; cap["g"] = "G"; cap["h"] = "H"
cap["i"] = "I"; cap["j"] = "J"; cap["k"] = "K"; cap["l"] = "L"
cap["m"] = "M"; cap["n"] = "N"; cap["o"] = "O"; cap["p"] = "P"
cap["q"] = "Q"; cap["r"] = "R"; cap["s"] = "S"; cap["t"] = "T"
cap["u"] = "U"; cap["v"] = "V"; cap["w"] = "W"; cap["x"] = "X"
cap["y"] = "Y"; cap["z"] = "Z"
}
{ if ($1 ~ /[a-z]+/) {
new = ""
last = length($1)
for (char=1; char <= last; ++char) {
cur = substr($1,char,1)
if (cur ~ /[a-z]/) {
new = new cap[cur]
} else {
new = new cur
}
}
print new
}
}
--------------------------------------------------------------------
>From jjm%atavax.decnet at afwl-vax.arpa Mon Mar 21 15:23:18 1988
Date: 21 Mar 88 14:57:00 MST
From: "ATAVAX::JJM" <jjm%atavax.decnet at afwl-vax.arpa>
Subject: AWK answer
To: "dph" <dph at LANL.GOV>
Status: R
Here is the answer.
BEGIN {CAP = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; LOW = "abcdefghijklmnopqrstuvwxyz"}
{ new = ""
for( i = 1; i< length($1)+1 ; i++)
if(index(LOW,substr($1,i,1)) != 0)
new = new substr(CAP,index(LOW,substr($1,i,1)),1)
else
new = new substr($1,i,1)
$1 = new
print $0
}
Please note that things could be sped up with
some vars (like x = substr($1,i,1) etc.
Please let me know how this works for you.
John McDermott
Applied Technology Associates
505/247-8371
Albuquerque
-----------------------------------------------------------------------
>From ima!ima.ISC.COM!marc at harvard.harvard.edu Tue Mar 22 06:33:35 1988
Date: Tue, 22 Mar 88 08:30:03 EST
From: marc at ima.isc.com (Marc Evans)
Message-Id: <8803221330.AA17525 at ima.ISC.COM>
To: dph at LANL.GOV
Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion)
It appears to me that in your examples, there is a specific argument that you
are interrested in making the conversion on (eg. $1). Therefore, saying that
the '| tr \[a-z\] \[A-Z\]' mechanism will not work is too narrow sighted. If
this is indead the case, try the following:
BEGIN {...}
($1 ~ [a-z]+) { print $1 | tr \[a-z\] \[A-Z\] }
(rest of patterns) { ... }
END {...}
In theory, I beleive that you should be able to express your rules in the
pattern section, such that the hierarchy of the patterns catches your special
needs, before the patterns below them. Remember, multiple patterns can be
matched, unless the 'next' directive is used (or something simular).
I hope that this may help? 8-)
-------------------------------------------------------------------------------
Marc Evans {decvax,inhp4,bbn,harvard}!ima!symetrx!marc
Symmetrix 11 Market Square, Ipswich, MA (617) 356-7811
-------------------------------------------------------------------------------
Date: Tue, 22 Mar 88 16:56:16 EST
From: Dick St.Peters <stpeters%dawn.tcpip at csbvax>
Posted-Date: Tue, 22 Mar 88 16:56:16 EST
Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion)
This ain't real elegant but is offered for consideration.
It's shorter than your version but undoubtedly slower too.
Dreaming it up was fun.
--
Dick St.Peters
GE Corporate R&D, Schenectady, NY
stpeters at ge-crd.arpa
uunet!steinmetz!stpeters
{ if ($1 ~ /[a-z]+/) {
new = ""
last = length($1)
for (char=1; char <= last; ++char) {
cur = substr($1,char,1)
if (cur ~ /[a-z]/) {
for (i=0; i<26; i++) {
tmp = substr("abcdefghijklmnopqrstuvwxyz",i,1)
if (tmp == cur) {
break;
}
}
new = new substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,1)
} else {
new = new cur
}
}
print new
}
}
-----------------------------------------------------------------------------
>From: bzs at bu-cs.BU.EDU (Barry Shein)
Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion)
Date: 22 Mar 88 06:58:55 GMT
> Convert possibly mixed-case strings to upper-case.
> (not counting case-less chars like digits)
The attached works under 4.3bsd as you required.
-Barry Shein, Boston University
BEGIN {
upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
lower = "abcdefghijklmnopqrstuvwxyz";
}
{
out = "";
for(i=1;i <= length($1);i++) {
if((cpos = index(lower,c = substr($1,i,1))) > 0)
c = substr(upper,cpos,1);
out = out c;
}
print out;
}
-------------------------------------------------------------------------
>From: sjmz at otter.hple.hp.com (Stefek Zaba)
Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion)
Date: 22 Mar 88 14:54:46 GMT
Make no apology! Your lookup table is a perfectly neat solution given the
wierd constraints you've acquired. Personally I'd even avoid the "if lc-alpha"
test, and construct a table with the full 128 (sorry, non-USASCII users!)
characters, using an awk FOR with printf %c, and then overwrite the 26 elements
of interest. This avoids the "if" in your inner loop (though maybe awk table
lookup is slow enough to make the "if" test a win in efficiency, if not
clarity.)
Keep at it - awk's clearly Turing-complete! (**Please**, no TM's-in-awk!!!)
More information about the Comp.unix.questions
mailing list