INTRODUCTION AND PROBLEM STATEMENT
The deaf and hard-of-hearing
community worldwide is estimated to be
250 million people
as the census of the year 2005. Two-thirds
of these people live in developing countries
(World Health Organization [WHO], 2005). The
way of communication among deaf people is
the sign language which differs from country
to other. In Malaysia the number of recorded
deaf people is 31,000 persons. And in the
US, the number of deaf people is estimated
to be more than 8.6% out of the whole
population (Harrington, T., 2004).
baby-boom generation ages, the number of
people with hearing loss could increase
quickly and substantially. The deaf and
hearing-impaired make up a sizable community
with specific needs that operators and
technology have only recently begun to
target. Consequently, the commercial market
worked and still working on developing
software that could fill the gap between the
deaf and non-deaf communities.
is only one such software that is
commercially available in the market. It is
called iCommuinicator and it translates
human speech in real time into video
American Sign Language (ASL) but to obtain
this software the deaf person has to pay
around USD 4,000 apart from the payments for
the updates (iCommunicator, 2005).
This arising a
very question that, why the deaf person has
to pay such a big amount of money? Is it
their sin to be a deaf? As it is just
mentioned, two-thirds of the deaf
communities are in the developing countries,
how they can get such a huge amount of
money? Does it mean that being a rich deaf
you can acquire such software otherwise you
have to remain ignorant and isolated?!
aforementioned questions necessitate the
development of freely available software
that translates the human speech into sign
language. This software will not only fill
the gap between the deaf and the non-deaf
people, but also it will be the starting
point for motivating the researchers in this
field to realize how much the deaf people
are in need for us.
design involves two parts. In the first
part, the speaker will input their voice
through microphone then a speech recognition
engine will translate this input voice into
text. The speech engine we manipulated for
this purpose is Sphinx 3.5. Sphinx 3 is the
successor to the Sphinx-II speech
recognition system from Carnegie Mellon
University (CMU). Sphinx 3.5 is the most
recent release of Sphinx 3. Amongst the
reasons for opting for the Sphinx 3.5 are
the following (Sourceforge, n.d.;
it is an open
source code, which means the developer can
modify this code freely without any
permission, constraints or license,
both an acoustic trainer and
various decoders, i.e.,
text recognition, phoneme recognition,
N-best list generation, etc.,
it is entirely
written in C,
it is working
in live mode,
it is large
vocabulary, speaker independent, continuous
speech recognizer and,
it runs about
10 times faster on large vocabulary tasks
than the previous release of the Sphinx 3
second part, simply the output of the speech
recognition engine will be applied to a sign
language database contains a certain number
of pre-recorded video sign language clips
and markers. The software we designed will
be matching the recognized text with its
corresponding sign language clip if any and
some marker might be appended depending on
the input word to the database. In case of
no match the word will be entirely
fingerspelled. The final output might
include a mixture of video signs, markers
OF RELATED WORK
intensively consulting the literature one
can find that there is only one such
software that is commercially available to
convert speech to text to video sign
language in real time. This reveals the very
shortage in this field which in turn induced
us to propose our solution. Hereinafter we
will be going through some related work in
terms of speech decoding (speech to text
conversion), text to sign language
translation and finally the speech to sign
language available software.
user inputs speech through microphone and
simply gets the uttered words as a text.
Many software are commercially available in
this context, however, mostly they are
neither open source nor free. Such software
as the cutting edge is Dragon naturally
speaking 8.0 which offered by the speech
recognition commercial market leader
Inc.). The price for this software is
USD 625 and in case of any new updates the
user has to pay for it (Dragon (2)). Other
software is the one offered by Microsoft. It
is called Microsoft Speech Application
Software Development Kit (SASDK). It is a
set of development tools supporting the
Speech Application Language Tags (SALT)
specification that make it easier and faster
for developers to incorporate speech
functionality into Web applications.
However, the developers cannot access the
source code because it is not open source
Where the user inputs typed
text and the dictionary outputs the
corresponding ASL as a video. Such
dictionaries include the one offered by
Michigan State University. It is called
Personal Communicator CD-ROM (Personal
Communicator, 2001). The
full dictionary is available on a CD-ROM.
Other similar dictionary on CD is called
The American Sign Language Video Dictionary
and Inflection Guide offered by the
National Technical Institute for the Deaf (ASL
Video Dictionary and Inflection Guide, 2001).
Apart from CD dictionaries online text to
animated ASL dictionaries are also available
This is the ultimate goal
where the user utters speech as input
through microphone and gets the
corresponding video sign language in real
leader in this field is a company called
1450, Inc. It is combination
between ScanSoft, Inc. and
Interactive Solutions, Inc. It offers
software called iCommunicator. It is the
first of its kind and the only available
software in this field. Its speech
recognition engine is the Dragon
NaturallySpeaking Professional, V.7.0 (iCommunicator).
on-going project is under Boston University.
It is called the American Sign Language
linguistic project. They are working on
creating a database called SignStream™.
A SignStream database consists of a
collection of utterances, where each
utterance associates a segment of video with
a detailed transcription of that video (SignStreamTM).
Since all these
technologies are available, and most of all,
the iCommunivator is there in the market.
What are the problems?
that we believe need to be researched and
figured out can be encapsulated as follows:
iCommunicator is the only available software
of its kind which was developed by 1450, Inc
which in turn is the only player on the
ground in the commercial market there is no
real competition on reducing the price.
The only motive
for developing the product is its revenues
not for the sake of the deaf people whom are
part of our society and deserve to get such
product freely or even with a reasonable
buyer will be absolutely constrained to the
developer company in the sense that
whatsoever update they offer the deaf person
has to pay extra money for it.
Since two third
of the deaf people are in the developing
countries, it is unrealistic that we could
expect the majority of them are capable of
buying such extremely expensive software.
To tackle these
problems, this thesis outlined the
development and design of the Speech to
Sign Language Interpreter System or
shortly we call it SSLIS.
1.5 Research goal
This thesis was
proposed in order to overcome the whole
aforementioned shortcomings and problems.
The prime and foremost goal of this thesis
is to design such open source freely
available software for translating the input
speech into text and sign language in real
To fulfill this
goal, the objectives of this research are:
software is open source in the since that
whosoever want to modify it is thoroughly
free to do so.
This will bring
the software developers attention to the
deaf community needs in this regard and it
will question their conscious for the
benefit of the deaf people.
this software is freely available, the whole
deaf community can acquire it without any
constraints or payments.
To fill the gap
between deaf and non deaf people in two
senses. Firstly, by using this software for
educational purposes for deaf people and
secondly by facilitating the communication
between deaf and non-deaf people.
increase spoken language comprehension and
improve literacy skills
independence and self-confidence of the deaf
opportunities for advancement and success in
education, employment, personal
relationships, and public access venues.
improve quality of life
Our software is
constrained to both the English language and
the American Sign Language. To clarify, sign
languages develop specific to their
communities and are not universal. For
example, American Sign Language (ASL) is
totally different from British Sign Language
(BSL) even though both countries speak
English. This is because of the most sign
languages develop independently and each
country (and in some cases, each city) has
their own sign language. Subsequently our
software is constrained to the American Sign
Language (ASL) but following the English
language syntax rather than the American
Sign Language (ASL) syntax which is totally
different from English language. As a matter
of fact we are following the Signed English
(SE) manual. It is a manual parallel to
English where the basic words are
represented by American Sign Language (ASL).
In addition, if non-basic word is found,
i.e. adverb, plural…etc., the Signed English
manual appends some marker to the basic
sign. We have implemented the ASL videos and
markers in what we call the sign language
database. If the recognized word was out
of the database’s vocabulary; simply the
word will be entirely fingerspelled instead
CONTRIBUTIONS AND SOFTWARE CAPABILITIES
below depicts the basic structure of our
Basic structure of SSLIS
parts are involved, these are: the
speech-recognition engine and the
sign language database. Upon speaking in
the microphone the speech recognition engine
will be converting real time uttered input
English speech into text. The speech engine
we manipulated for this task is Sphinx 3.5
which we have properly altered and
integrated into the graphical user interface
of the speech to sign language interpreter
system. The engine’s output (recognized
text) will input to the sign language
database to find a match on a word basis. We
have included the following in the sign
language database: a certain number of ASL
video clips signs where one single clip
corresponds to one basic English word, SE
markers and the American Manual alphabet. As
a matter of fact, input basic word should be
matched with some pre-recorded ASL video
clip as long as the corresponding clip is
available in the database. In case input
word was not basic, simply the basic word
will be extracted out of it and applied to
the database as just described then some
marker will be appended to form the
corresponding sign of the original input
word. However, in case of unavailability of
the clip simply the recognized word will be
fingerspelled. It should be stressed here
that we are not following the ASL syntax but
rather we are employing the Signed English
manual as a manual parallel to English.
Speech to Sign Language Interpreter System (SSLIS)
capabilities can be encapsulated as follows:
Real time speech to text to on-screen video
sign language translation.
speech to text translation.
on-screen video sign language.
control the speed of signing (displayed
movies, markers and fingerspelling).
of the speech recognition process through
three supplied programs, which the user can
select from and execute, these are: Decode,
Live Pretend and Live Decode.
accuracy calculations in terms of Word
recognition Error Rate (WER) with
supplementary examples demonstrate WER.
of the sign language database constituents
and definitions of non basic words that our
software is capable to recognize.
to control the format of displayed text, its
foreground and background to suit the user
Ability to sign
text using the tool we have designed while
you are browsing some web page or using text
editor through just dragging and dropping
chapters are organized as follows:
Introduction. An introductory chapter
exposes the thesis statement, literature
review, current problems, our proposed
solutions to them, research scope, goal and
objectives, contributions and SSLIS
capabilities and the thesis outline.
State-of-the-Art of Speech Recognition.
In this chapter the Large Vocabulary
Continuous Speech Recognition (LVCSR)
problem is approached. The chapter begins
with a definition of the Automatic Speech
Recognition (ASR), review of different types
of speech recognizers, brief history of
Speech Recognition (SR), speech databases
and then it goes further through the
structure and operation of the speech
recognizer paving the way for the next
Integration and Modification of Sphinx
In this chapter; speaker independent, large
vocabulary and continuous speech recognition
problem is generally approached in a quite
detail through analyzing the anatomy and
operation of the Sphinx 3.5 being the speech
recognition engine we manipulated for
developing our software. Then the source
code of Sphinx 3.5 is approached. Firstly,
the programs included in the open source
package are named. Secondly, an overview of
the decoder is shown. Finally, the Sphinx
3.5 operation is explained.
Sign Language. This chapter was
intensified to cover as far as it is
concerned of the sign language topic.
Firstly, a definition of the sign language
is given. Secondly, the sign language and
deaf education issue is discussed. Thirdly,
characteristics of sign languages are shown.
Fourthly, the topic of American Sign
Language (ASL) is discussed in more detail
wherein a description of the American Sign
Language (ASL) syntax and manual alphabet is
approached. Then finally the chapter goes
further into Signed English (SE) as a
reasonable manual parallel to English.
Development of SSLIS.
This chapter describes the structure of the
Speech to Sign Language Interpreter System (SSLIS)
and explains the contributions done.
Description of the speech recognition engine
and the ASL database, their structure, our
own contributions, SSLIS installation,
parameters optimization, implementation
issues and accuracy measurements are all
approached in a quite detailed manner.
This chapter is dedicated to
demonstrate how one can use the GUI of SSLIS
and to demonstrate the software
capabilities. A full description with
snapshots of SSLIS is given.
Further Work Suggestions.
the final chapter which concludes our
thesis. Approached topics include, key
findings, shortcomings, conclusions and
suggestions for further work.