- Open Access
- Total Downloads : 275
- Authors : Akshay S. Deshpande, Keshav S. Ambulgekar, Kedar R. Joshi, Akshata H. Utgikar
- Paper ID : IJERTV3IS100924
- Volume & Issue : Volume 03, Issue 10 (October 2014)
- Published (First Online): 30-10-2014
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Voice to Voice Language Translation System
Akshay Suresh Deshpande1 Keshav Shesharao Ambulgekar2 Kedar Raghunath Joshi3
Electronics and Telecommunication Engineering Maharashtra Institute of Technology Aurangabad, Maharashtra
Akshata H. Utgikar
Assistant Professor Electronics and Telecommunication Dept.
Maharashtra Institute of Technology Aurangabad, Maharashtra
Abstract: In this paper the nascent stage of developing a personalized interpreter, we propose to develop a prototype which uses a speech processing hardware and on translators to provide the user with real time translation. Speech processing hardware works on the principle of compare and forward, i.e., a database is already stored in the unit which is used for comparing with the input speech and the result is forwarded for further processing. The need arises from the inability of dictionaries and human translators to suit our needs for better communication. In this situation the prototype proposed will suffice the purpose reasonably well and minimize the communication inefficiencies.
Keywords Language Translator, Microcontroller, Speech Recognition HM2007, Speech Synthesizer APR6016.
-
INTRODUCTION
The global, borderless economy has made it critically important for speakers of different languages to be able to communicate. Speech translation technology being able to speak and have ones words translated automatically into the other persons language has long been a dream of humankind. Speech translation has been selected as one of the ten technologies that will change the world. There are especially high hopes in Japan for a speech-translation system that can automatically translate ones everyday speech as use of Japanese has become increasingly international, so such speech-translation technology would be a great boon to the nation.
Automatic speech translation technology consists of three separate technologies: technology to recognize speech (speech recognition); technology to translate the recognized words (language translation); and technology to synthesize speech in the other persons language (speech synthesis). Recent technological advances have made automatic translation of conversational spoken Japanese, English, and Chinese for travelers practical, and consecutive translation of short, simple conversational sentences spoken one at a time has become possible.
This report starts by affirming the significance of speech-translation technology, and providing an overview of the state of research and development to date, and the history of automatic translation technology. It goes on to describe the architecture and current performance of speech translation systems.
-
SURVEY
Following are the systems implemented for Speech to Text & Speech to Speech Translation commercially available in the market.
-
Text To Speech Translator:
The SP0-512 Text to Speech IC is a pre- programmed microcontroller that accepts English text from a serial connection converts that text to phoneme codes then generates audio. It is ideal for adding a robot voice to your embedded designs.
-
VOICE ACTIVATED PHRASE LOOKUP (Text to Speech System):
Voice activated phrase lookup systems are not true speech translation systems by definition. A typical voice activated phrase lookup system is the Phraselator system. The Phraselator is a one-way device that can recognize a set of pre-defined phrases and play a recorded translation.
This device can be ported easily to new languages, requiring only a hand translation of the phrases and a set of recorded sentences. However, such a system severely limits communication as the translation is one way, thus reducing one partys responses to simple pointing and perhaps yes and no.
-
SIGMO (Speech to Speech System):
SIGMO allows real-time translating of 25 languages. It has two modes of voice translation. Set the native language, then the language to translate to. By pressing the first button and speaking the Set phrase SIGMO in turn will instantly translate and pronounce it in a selected language. By pressing the second button, it will translate speech from the foreign language, then instantly speak selected native language.
-
MASTOR (IBM)
MASTOR (Multilingual Automatic Speech-To- Speech Translator) is IBMs highly trainable speech-to- speech translation system, targeting conversational spoken language translation between English and Mandarin Chinese for limited domains depicts the architecture of MASTOR. The speech input is processed and decoded by a large- vocabulary speech recognition system. Then the transcribed text is analyzed by a statistical parser for semantic and
syntactic features. A sentence-level natural language generator based on maximum entropy (ME) modeling is used to generate sentences in the target language from the parser output. The produced sentence in target language is synthesized into speech by a high quality text-to-speech system.
-
Matrix (ATR)
The Spoken Language Translation Research Laboratories of the Advanced Telecommunications Research Institute International (ATR) has five departments. Each department focuses on a certain area of Speech Translation.
This system can recognize natural Japanese utterances such as those used in daily life, translate them into English and output synthesized speech. This system is running on a workstation or a high end PC and achieved nearly real-time processing. Unlike its predecessor ASURA, ATR-MATRIX is designed for spontaneous speech input, and it is much faster. The current implementation deals with a hotel room reservation task. ATR-MATRIX adopted a cooperative integrated language translation model. Because of its small, light size, and available attachments it is portable and easy to use.
-
-
PROPOSED SYSTEM
Fig. 1. Block Diagram of the system
The figure explains the Block Diagram of Voice to Voice Language Translation System in which the input speech is given through the microphone which then goes to the speech processing unit. This unit processes the input and the word which was spoken is recognized. The input speech first goes to the speech IC of the speech processing unit.
-
SPEECH RECOGNITION SYSTEM
Speech Recognition is the process of converting an acoustic signal, captured by microphone or telephone to a set of words. In this system, HM2007 is used as a Speech Recognition unit. The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip contains an
analog front end, voice analysis, regulation, and system control functions. The chip may be used in a stand alone or CPU connected.
Speech recognition is divided into two broad processing categories; speaker dependent and speaker independent. Speaker dependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95% accuracy for word recognition. The drawback to this approach is that the system only responds accurately only to the individual who trained the system. This is the most common approach employed in software for personal computers. Speaker independent is a system trained to respond to a word regardless of who speaks. Therefore the system must respond to a large variety of speech patterns, inflections and enunciation's of the target word. The command word count is usually lower than the speaker dependent however high accuracy can still be maintain within processing limits. Industrial applications more often require peaker independent voice recognition systems.
In this system we are using speaker independent mode as it can use by any one.
Some features of HM2007 are as follows:
-
Single chip voice recognition CMOS LSI
-
Speaker dependent
-
External RAM support
-
Maximum 40 word recognition (.96 second)
-
Maximum word length 1.92 seconds (20 words)
-
Microphone support
-
Manual and CPU modes available
-
Response time less than 300 milliseconds
-
5V power supply
The speech recognition system is a completely assembled and easy to use programmable Speech recognition circuit. Programmable, in the sense that we train the words (or vocal Utterances) we want the circuit to recognize. This board allows you to experiment with many facets of speech recognition technology. It has 8 bit data out which can be interfaced with any Microcontroller for further development.
The input to the system is feed by microphone to speech recognizer circuitry which is used to recognize the words that are already stored in the system. The speech recognition & recording system requires an external memory which is sufficed by a SRAM. Speech recognition & recording system along with the static RAM forms the fundamental block of the speech processing unit. The database is stored in the SRAM and the Speech processing unit is used in the recognition mode where comparison of the input and the database takes place and a particular eight bit BCD address is given as the result. This BCD address is feed to digital data processing unit. The microcontroller used in this system will convert the input address from HM2007 and process it in such a way that the address generated at the
output will specify the address of the same word but in the different language, which will be then feed to the APR6016 in order to retrieve the word stored in the synthesizer system.
-
-
SPEECH SYNTHESIZER
In this system we are using APR6016 as audio playback and recorder part which is at the output of the system as shown in the block diagram. The APR6016 offers non-volatile storage of voice and/or data in advanced Multi- Level Flash memory. Up to 16 minutes of audio recording and playback can be accommodated. The APR6016 memory array is organized to allow the greatest flexibility in message management and digital storage. The smallest addressable memory unit is called a sector. The APR6016 contains 1280 sectors. Sectors 0 through 1279 can be used for analog storage. During audio recording one memory cell is used per sample clock cycle.
The APR 6016 stores voice signals by sampling incoming voice data and storing the sampled signals directly into FLASH memory cells. Each FLASH cell can support voltage ranges from 1 to 256 levels. These 256 discrete voltage levels are the equivalent of eight (28=256) bit binary encoded values. During playback the stored signals are retrieved from memory, smoothed to form a continuous signal and finally amplified before being fed to an external speaker amplifier. Device control is accomplished through an industry standard SPI interface that allows a microcontroller to manage message recording and playback.
The APR 6016 is equipped with an internal squelch feature. The Squelch circuit automatically attenuates the output signal by 6 dB during quiet passages in the playback material. Muting the output signal during quiet passages helps eliminate background noise. Background noise may enter the system in a number of ways including: present in the original signal, natural noise present in some power amplifier designs, or induced through a poorly filtered power supply.
The audio signal containing the content we wish to record should be fed into the differential inputs ANAIN-, and ANAIN+. After pre-amplification the signal is routed into the anti-aliasing filter. The anti-aliasing filter automatically adapts its response based on the sample rate being used. No external anti-aliasing filter is therefore required. After passing through the anti-alias filter, the signal is fed into the sample and hold circuit which works in conjunction with the Analog Write Circuit to store each analog sample in a flash memory cell.
The audio signal containing the content you wish to record should be fed into the differential inputs ANAIN-, and ANAIN+. After pre-amplification the signal is routed into the anti-aliasing filter. The anti-aliasing filter automatically adapts its response based on the sample rate being used. No external anti-aliasing filter is therefore required. After passing through the anti-alias filter, the signal is fed into the sample and hold circuit which works in conjunction with the Analog Write Circuit to store each analog sample in a flash memory cell.
The APR contains a 20 bit op-code register, out off which 14 bits are for the sector address and remaining 5 bits are for the op-code of various instruction. The instructions
and there op-code with the summary of the instruction is listed in the table given below:
TABLE 1 OPERATIONAL CODES
INSTRUCTION NAME
OP-CODE [OP4-OP0]
SUMMARY
NOP
00000
No Operation
SID
00001
Causes the Silicon ID to be read
STOP
00110
Stop the current Operation
STOP_PWDN
00111
Stop the current Operation & causes the device to enter into Power Down mode
SET_REC
01000
Start a Record Operation from the Sector Address specified
REC
01001
Start a Record Operation from the Current Sector Address specified
SET_PLAY
01100
Start a Playback Operation from the Sector Address specified
PLAY
01101
Start a Playback Operation from the current Sector specified
When a SET_REC or REC command is issued the device will begin sampling and storing the data present on ANAIN+ and ANAIN- to the specified sector. After half the sector is used the SAC pin will drop low to indicate that a new command can be accepted. The device will accept commands as long as the SAC pin remains low. Any command received after the SAC returns high will be queued up and executed during the next SAC cycle.
The SET_REC command begins recording at the specified memory location after Tarec time has passed. Some time later the low going edge on the SAC pin alerts the host processor that the first sector is nearly full. The host processor responds by issuing a REC command before the SAC pin returns high. The REC command instructs the APR6016 to continue recording in the sector immediately following the current sector. When the first sector is full the device automatically jumps to the next sector and returns the SAC signal to a high state to indicate that the second sector is now being used. At this point the host processor decides to issue a STOP command during the next SAC cycle. The device follows the STOP command and terminates recording after TSarec. The /BUSY pin indicates when actual recording is taking place. The typical recording sequence is as shown below:
-
-
SYSTEM FLOW
Start
Input Speech
Compare & forward address
Fig. 2. Typical Recording Sequence
When a SET_PLAY or PLAY command is issued the device will begin sampling the data in the specified sector and produce a resultant output on the AUDOUT, ANAOUT-, and ANAOUT+ pins. After half the sector is used the SAC pin will drop low to indicate that a new command can be accepted. The device will accept commands as long as the SAC pin remains low. Any command received after the SAC returns high will be queued upand executed during the next SAC cycle. Figure 2 shows typical playback sequence:
Is Add= 1-5?
Wait for some delay
Is next input recognized
?
Is Add= 55?
Voice too short
Voice too long
Is Add= 66?
Is Add= 77?
Fig. 3. Typical Playback Sequences
Data for Speech Synthesizer
Voice Output
End
Add previous & present input
Fig. 4. System Flow chart
Translate the received code
No Match found
Invalid address
-
CONCLUSION
Voice to Voice Language Translation system is a device that is designed to bridge the language gap between individuals and foreigners when traveling in our country. The need arises from the inability of dictionaries and human translators to suit our needs for better communication.
At present we need Personalized Interpreters which will reduce our dependence on dictionaries and human interpreters. This will reduce the hindrance posed by the language barrier. In this situation the system proposed will suffice the purpose reasonably well and minimize the communication inefficiencies.
The system can overcome the real time difficulties of illiterate people and improve their lifestyle.
REFERENCES
[1]. Sign Language to Speech Translation System Using PIC Microcontroller, Gunasekaran. K1, Manikandan. R2, Senior Assistant Professor2, School of Computing, SASTRA University, Tirumalaisamudram, Tamilnadu, India-613401. guna1kt@gmail.com1, manikandan75@core.sastra.edu2. Volume 5, No 2 Apr-May 2013 [ISSN NO: 0975-4024] [2]. Speech to Speech Language Translator, Umeaz Kheradia, Abha Kondwilkar, B.E (Electronics & Telecommunication), Rajiv Gandhi Institute of Technology.umeaz_kheradia17@yahoo.com, abhassk@yahoo.co.in. Volume 2, Issue 12, December 2012 [ISSN2250-3153] [3]. Process Speech Recognition System using Artificial Intelligence Technique, Anupam Choudhary, Ravi Kshirsagar, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-5, November 2012.
[4]. An Implementation of Text Dependent Speaker Independent Isolated Word Speech Recognition Using HMM INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCHTECHNOLOGY (IJESRT), Ms. Rupali S Chavan, Dr. Ganesh S.
Sable, [September, 2013] ISSN: 2277-9655 Impact Factor: 1.852 [5]. Patent US103508 B3 Voice Activated Language Translator.
[6]. Patent Brevettos US6085160 Language Independent Speech Recognition.