Home Contact Site Map
Intro to my  Master's Thesis
Intro to my  Master's Thesis
SSLIS Snapshots

Research Title Speech to Sign language Interpreter System (SSLIS)
Level Master's (Dissertation)
Date of completion June 2006



 The deaf and hearing-impaired make up a sizable community with specific needs that operators and technology have only recently begun to target. The commercial market was and still working on developing software that might fill the gap between the deaf and non-deaf communities in the sense that it facilitates the communication amongst them through translating the spoken speech to text and sign language. There is no such freely available software, let alone a single one with a reasonable price to translate uttered speech into sign language in real time. In this thesis, this problem was tackled through presenting the “Speech to Sign Language Interpreter System (SSLIS)” to translate uttered English speech into video American Sign Language (ASL) in live mode. In addition to its main task, other interesting features were added to the SSLIS to make it even more comprehensive and beneficial. The Sphinx 3.5 was manipulated as the speech recognition engine for the SSLIS and for translation ASL syntax was not followed, but rather the Signed English (SE) manual was employed as a manual parallel to English. Moreover, for users who are not familiar with the Signed English manual or want to gain more, SSLIS enables them to thoroughly comprehend it through the supplied utility. In addition, deaf users can type words and enable the computer to utter them. For those interested in speech recognition and how it works, SSLIS demonstrates it and even enables users to calculate the recognition’s accuracy automatically using the designed utility for this purpose. Other capabilities include, ability to video-sign any text from web browser or text editor through just selecting, dragging and dropping it, ability to control movies’ speed, ability to control displayed text format, foreground and background colors, ability to perform speech recognition in both live and batch mode. We believe that SSLIS would benefit the deaf people, facilitate the acquisition of English as a second language for them and fill the gap between deaf and non-deaf communities.

Table of Contents






The deaf and hard-of-hearing community worldwide is estimated to be 250 million people as the census of the year 2005. Two-thirds of these people live in developing countries (World Health Organization [WHO], 2005). The way of communication among deaf people is the sign language which differs from country to other. In Malaysia the number of recorded deaf people is 31,000 persons. And in the US, the number of deaf people is estimated to be more than 8.6% out of the whole population (Harrington, T., 2004).


As the baby-boom generation ages, the number of people with hearing loss could increase quickly and substantially. The deaf and hearing-impaired make up a sizable community with specific needs that operators and technology have only recently begun to target. Consequently, the commercial market worked and still working on developing software that could fill the gap between the deaf and non-deaf communities.


Currently there is only one such software that is commercially available in the market. It is called iCommuinicator and it translates human speech in real time into video American Sign Language (ASL) but to obtain this software the deaf person has to pay around USD 4,000 apart from the payments for the updates (iCommunicator, 2005).


This arising a very question that, why the deaf person has to pay such a big amount of money? Is it their sin to be a deaf? As it is just mentioned, two-thirds of the deaf communities are in the developing countries, how they can get such a huge amount of money? Does it mean that being a rich deaf you can acquire such software otherwise you have to remain ignorant and isolated?!


All the aforementioned questions necessitate the development of freely available software that translates the human speech into sign language. This software will not only fill the gap between the deaf and the non-deaf people, but also it will be the starting point for motivating the researchers in this field to realize how much the deaf people are in need for us.


Our software design involves two parts. In the first part, the speaker will input their voice through microphone then a speech recognition engine will translate this input voice into text. The speech engine we manipulated for this purpose is Sphinx 3.5. Sphinx 3 is the successor to the Sphinx-II speech recognition system from Carnegie Mellon University (CMU). Sphinx 3.5 is the most recent release of Sphinx 3.  Amongst the reasons for opting for the Sphinx 3.5 are the following (Sourceforge, n.d.; Gouvêa, E, n.d.).


it is an open source code, which means the developer can modify this code freely without any permission, constraints or license,


it includes both an acoustic trainer and various decoders, i.e., text recognition, phoneme recognition, N-best list generation, etc.,

it is entirely written in C,

it is working in live mode,

it is large vocabulary, speaker independent, continuous speech recognizer and,

it runs about 10 times faster on large vocabulary tasks  than the previous release of the Sphinx 3 decoder.


      In the second part, simply the output of the speech recognition engine will be applied to a sign language database contains a certain number of pre-recorded video sign language clips and markers. The software we designed will be matching the recognized text with its corresponding sign language clip if any and some marker might be appended depending on the input word to the database. In case of no match the word will be entirely fingerspelled. The final output might include a mixture of video signs, markers and fingerspelling.



Upon intensively consulting the literature one can find that there is only one such software that is commercially available to convert speech to text to video sign language in real time. This reveals the very shortage in this field which in turn induced us to propose our solution.  Hereinafter we will be going through some related work in terms of speech decoding (speech to text conversion), text to sign language translation and finally the speech to sign language available software.


1.3.1 Speech to text software

Wherein the user inputs speech through microphone and simply gets the uttered words as a text. Many software are commercially available in this context, however, mostly they are neither open source nor free. Such software as the cutting edge is Dragon naturally speaking 8.0 which offered by the speech recognition commercial market leader Scansoft (Scansoft, Inc.). The price for this software is USD 625 and in case of any new updates the user has to pay for it (Dragon (2)). Other software is the one offered by Microsoft. It is called Microsoft Speech Application Software Development Kit (SASDK). It is a set of development tools supporting the Speech Application Language Tags (SALT) specification that make it easier and faster for developers to incorporate speech functionality into Web applications. However, the developers cannot access the source code because it is not open source (Microsoft Corporation).


1.3.2 English text to video-ASL dictionaries

Where the user inputs typed text and the dictionary outputs the corresponding ASL as a video. Such dictionaries include the one offered by Michigan State University. It is called Personal Communicator CD-ROM (Personal Communicator, 2001). The full dictionary is available on a CD-ROM. Other similar dictionary on CD is called The American Sign Language Video Dictionary and Inflection Guide offered by the National Technical Institute for the Deaf (ASL Video Dictionary and Inflection Guide, 2001). Apart from CD dictionaries online text to animated ASL dictionaries are also available (HandSpeakTM).


1.3.3 Speech to sign language software

This is the ultimate goal where the user utters speech as input through microphone and gets the corresponding video sign language in real time. The leader in this field is a company called 1450, Inc. It is combination between ScanSoft, Inc. and Interactive Solutions, Inc. It offers software called iCommunicator. It is the first of its kind and the only available software in this field. Its speech recognition engine is the Dragon NaturallySpeaking Professional, V.7.0 (iCommunicator).

Another still on-going project is under Boston University. It is called the American Sign Language linguistic project. They are working on creating a database called SignStream™. A SignStream database consists of a collection of utterances, where each utterance associates a segment of video with a detailed transcription of that video (SignStreamTM).

Since all these technologies are available, and most of all, the iCommunivator is there in the market. What are the problems?



The problems that we believe need to be researched and figured out can be encapsulated as follows:

i. As iCommunicator is the only available software of its kind which was developed by 1450, Inc which in turn is the only player on the ground in the commercial market there is no real competition on reducing the price.

ii. The only motive for developing the product is its revenues not for the sake of the deaf people whom are part of our society and deserve to get such product freely or even with a reasonable price!

iii. The buyer will be absolutely constrained to the developer company in the sense that whatsoever update they offer the deaf person has to pay extra money for it.

iv. Since two third of the deaf people are in the developing countries, it is unrealistic that we could expect the majority of them are capable of buying such extremely expensive software.

To tackle these problems, this thesis outlined the development and design of the Speech to Sign Language Interpreter System or shortly we call it SSLIS.

1.5 Research goal and objectives

This thesis was proposed in order to overcome the whole aforementioned shortcomings and problems. The prime and foremost goal of this thesis is to design such open source freely available software for translating the input speech into text and sign language in real time.

To fulfill this goal, the objectives of this research are:

i. The software is open source in the since that whosoever want to modify it is thoroughly free to do so.

ii. This will bring the software developers attention to the deaf community needs in this regard and it will question their conscious for the benefit of the deaf people.

iii. Since this software is freely available, the whole deaf community can acquire it without any constraints or payments.

iv. To fill the gap between deaf and non deaf people in two senses. Firstly, by using this software for educational purposes for deaf people and secondly by facilitating the communication between deaf and non-deaf people.

v. To increase spoken language comprehension and improve literacy skills

vi. To increase independence and self-confidence of the deaf person.

vii. To increase opportunities for advancement and success in education, employment, personal relationships, and public access venues.

viii. To improve quality of life

1.6 Research Scope

Our software is constrained to both the English language and the American Sign Language. To clarify, sign languages develop specific to their communities and are not universal. For example, American Sign Language (ASL) is totally different from British Sign Language (BSL) even though both countries speak English. This is because of the most sign languages develop independently and each country (and in some cases, each city) has their own sign language. Subsequently our software is constrained to the American Sign Language (ASL) but following the English language syntax rather than the American Sign Language (ASL) syntax which is totally different from English language. As a matter of fact we are following the Signed English (SE) manual. It is a manual parallel to English where the basic words are represented by American Sign Language (ASL). In addition, if non-basic word is found, i.e. adverb, plural…etc., the Signed English manual appends some marker to the basic sign. We have implemented the ASL videos and markers in what we call the sign language database. If the recognized word was out of the database’s vocabulary; simply the word will be entirely fingerspelled instead of signed.



Figure 1.1 below depicts the basic structure of our software.


Figure 1.1. Basic structure of SSLIS


Mainly, two parts are involved, these are: the speech-recognition engine and the sign language database. Upon speaking in the microphone the speech recognition engine will be converting real time uttered input English speech into text. The speech engine we manipulated for this task is Sphinx 3.5 which we have properly altered and integrated into the graphical user interface of the speech to sign language interpreter system. The engine’s output (recognized text) will input to the sign language database to find a match on a word basis. We have included the following in the sign language database: a certain number of ASL video clips signs where one single clip corresponds to one basic English word, SE markers and the American Manual alphabet. As a matter of fact, input basic word should be matched with some pre-recorded ASL video clip as long as the corresponding clip is available in the database. In case input word was not basic, simply the basic word will be extracted out of it and applied to the database as just described then some marker will be appended to form the corresponding sign of the original input word. However, in case of unavailability of the clip simply the recognized word will be fingerspelled. It should be stressed here that we are not following the ASL syntax but rather we are employing the Signed English manual as a manual parallel to English.


The overall Speech to Sign Language Interpreter System (SSLIS) capabilities can be encapsulated as follows:

1-  Real time speech to text to on-screen video sign language translation.

2-  Real time speech to text translation.

3-  Text to on-screen video sign language.

4- Ability to control the speed of signing (displayed movies, markers and fingerspelling).

5-  Demonstration of the speech recognition process through three supplied programs, which the user can select from and execute, these are: Decode, Live Pretend and Live Decode.

6-  Recognition’s accuracy calculations in terms of Word recognition Error Rate (WER) with supplementary examples demonstrate WER.

7-  List of the sign language database constituents and definitions of non basic words that our software is capable to recognize.

8-  Ability to control the format of displayed text, its foreground and background to suit the user needs.

9- Ability to sign text using the tool we have designed while you are browsing some web page or using text editor through just dragging and dropping that text.


1.8 Thesis Outline

The thesis chapters are organized as follows:

     Chapter 1: Introduction. An introductory chapter exposes the thesis statement, literature review, current problems, our proposed solutions to them, research scope, goal and objectives, contributions and SSLIS capabilities and the thesis outline.

     Chapter 2: State-of-the-Art of Speech Recognition. In this chapter the Large Vocabulary Continuous Speech Recognition (LVCSR) problem is approached. The chapter begins with a definition of the Automatic Speech Recognition (ASR), review of different types of speech recognizers, brief history of Speech Recognition (SR), speech databases and then it goes further through the structure and operation of the speech recognizer paving the way for the next chapter.

     Chapter 3: Integration and Modification of Sphinx Engine. In this chapter; speaker independent, large vocabulary and continuous speech recognition problem is generally approached in a quite detail through analyzing the anatomy and operation of the Sphinx 3.5 being the speech recognition engine we manipulated for developing our software. Then the source code of Sphinx 3.5 is approached. Firstly, the programs included in the open source package are named. Secondly, an overview of the decoder is shown. Finally, the Sphinx 3.5 operation is explained.

     Chapter 4: Sign Language. This chapter was intensified to cover as far as it is concerned of the sign language topic. Firstly, a definition of the sign language is given. Secondly, the sign language and deaf education issue is discussed. Thirdly, characteristics of sign languages are shown. Fourthly, the topic of American Sign Language (ASL) is discussed in more detail wherein a description of the American Sign Language (ASL) syntax and manual alphabet is approached. Then finally the chapter goes further into Signed English (SE) as a reasonable manual parallel to English.

     Chapter 5: Development of SSLIS. This chapter describes the structure of the Speech to Sign Language Interpreter System (SSLIS) and explains the contributions done. Description of the speech recognition engine and the ASL database, their structure, our own contributions, SSLIS installation, parameters optimization, implementation issues and accuracy measurements are all approached in a quite detailed manner.

     Chapter 6: System Performance Evaluation. This chapter is dedicated to demonstrate how one can use the GUI of SSLIS and to demonstrate the software capabilities. A full description with snapshots of SSLIS is given.

     Chapter 7: Conclusions and Further Work Suggestions. This is the final chapter which concludes our thesis. Approached topics include, key findings, shortcomings, conclusions and suggestions for further work.



System Flowcharts

Figure 5.2. SSLIS’s database structure and operation

Figure 5.3. Flowchart of the main program

Figure 5.4. Flowchart of the class procedure

Parameters Optimization and Accuracy Measurements

Figure 5.5. Log(beam) vs. Session Number

Figure 5.6. Log (pbeam) vs. Session Number

Figure 5.7. Log(wbeam) vs. Session Number

Figure 5.8. WER vs. Session Number

Figure 5.9. –lw vs. Session Number

Figure 5.10. WER vs. Session Number



Interested to know more!

Do you find it an exciting research work?

You want to know more?


Well, great ... Please do not hesitate to contact me.


Copyright © 2007.  By Khalid El-Darymli