UNIX PC Voice Power: unlocking the untapped capabilities? (*LONG*)
Charles Brunow
clb at loci.UUCP
Wed Nov 9 07:35:19 AEST 1988
In article <540 at icus.islp.ny.us>, lenny at icus.islp.ny.us (Lenny Tropiano) writes:
> I've posed this before, but now I have proof that it's possible. I've
> spoken with various people (some who were on the original Voice Power
> development team) who couldn't give me "specifics" but said it was
> possible. Voice Recognition, how? That's the question... Since my
> involvement with the Voice Power product on the UNIX pc, I've learned
> a lot. Learning bits and pieces about CODEC's, PCM (pulse code modulation),
> DSP's (digital signal processors), sub bands, mu-law, a-law, etc... It's
> still very technical, and way over my head, but I'm learning... [side note:
> if there is anyone out there who can give me help in the above topics
> please feel free to contact me].
>
If you don't already know this stuff pat then you're years away
from speech recognition (SR). The coding method and companding
are basic stuff which you can find in telco references. There's
a bit in "Transmission Systems for Communications", by "Members
of the Technical Staff - Bell Telephone Laboratories", and you
could profit from "Digital Signal Processing" by Alan V. Oppenheim
and Ronald W. Schafer (Prentice-Hall, 1975). There are bound
to be other references which are basically equivalent.
Another sources might be the app notes put out by TI a few years
back when they were trying to convince the world that they had
the best speech stuff. Some of it is very specific, like how
the vocal tract simulations work (schematics). My archives are
too confused to find copies so maybe someone else can lay their
hands on a copy for you.
Ultimately the process probably consists of determining the
coefficients for the filter nodes and looking for the best
match with the set of known words and updating the coefficients
either completely or with a damping factor for learning. The
problem is that knowing that doesn't get you much closer to
actually doing it. There is loads of raw data (assume a 8KHz
sample rate) which has to be reduced to a form which can be
efficiently processed while keeping enough data to distinguish
similar words from different people. Many people have spent
lots of time on it without significant break-thoughs.
--
CLBrunow - KA5SOF
clb at loci.uucp, loci at csccat.uucp, loci at killer.dallas.tx.us
Loci Products, POB 833846-131, Richardson, Texas 75083
More information about the Unix-pc.general
mailing list