Design, Development and Implementation of an Automated IVR System with feature based TTS using Open Source Tools

DOI : 10.17577/IJERTV1IS3052

Download Full-Text PDF Cite this Publication

Text Only Version

Design, Development and Implementation of an Automated IVR System with feature based TTS using Open Source Tools

1Anil Kumar, 2S. Nira njan

1P.hD scholar, CMJ University, Shillong, 2Professor, PDM College of Engg, Bahadur garh

Abstract

The Paper describes the concept of Interactive Voice Response System with feature based Te xt to Speech System using Open Source Software &Tools. The text to speech system converts the input text into the desired speech, the system is using for , to reading the user input text, speak a loud in various languages.

Interactive Voice Response (IVR) systems are comple x network e le ments providing a comprehensive set of features and functionality in order to complete or even substitute human call agents. The IVR system is very often the only contact a caller has with a company when he requests a service, such as reserving a ticket for a movie. It is therefore very important that the IVR system provides high quality, in terms of robustness, stability, correctness of the menu branches and quality of the voice announcements. This White Paper discusses key objectives and requirements for the efficient and comprehensive testing of Interactive Vo ice Response (IVR) systems.

Intro ductio n IVRs & TTS

IVRs (Interactive Voice Response systems) represent a powerful means for automating business and customer-facing processes. It is an automated telephony technology that is used to interact to the clients or customer through phone keypad or voice commands. . IVR systems process phone calls, play pre-recorded messages, provide callers with rea l- time data from any number of databases and potentially route calls to service agents. IVR technology requires virtually no human interaction over the telephone, as the user's interaction with the database is predetermined by what the IVR system will allow the user access to. For e xa mp le, banks and credit card companies use IVR systems so that their customers can receive up -to-date account informat ion instantly and easily without having to wait to speak with someone directly. Mostly it is used 24*365 days in the BPO, KPO, and Banking sector as well as in-house and offices to save money and employee resources where any emergency inquiry or service is not required. IVR system is very us eful where some limited inquiry is asked like checking bank balances, managing credit cards, checking the timing

of store hours or locations, or ordering the prescript med icine etc IVR systems can combine touch-tone input, speech recognition and text-to-speech capabilit ies, resulting in high customer satisfaction and operational effectiveness.

Te xt to speech System is most widely used system in speech technology. We have various text to speech synthesizer systems available like Festival, Multilingual and Flite etc. A Te xt-To- Speech (TTS) systems a computer-based system that should be able to read any text a loud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. Speech synthesis is a process where verbal communicat ion is replicated through an artificia l device. A computer that converts text to speech is one kind of speech synthesizer. In the business world, such situations are very common, especially for telephone transactions. Without text-to-speech (TTS) alternatives, business owners would have to spend money hiring even more customer service personnel. Synthesized solutions avoid this problem, since everything is done by computer, not a human being. Depending on the level of sophistication of the individual device, the sounds produced may be somewhat stilted and artificia l sounding, or sound very much like the voice of a real

person. The concept of speech synthesis has been

ESRSA Publication © 2012 http://www.ijert.org

around for centuries, but only in recent decades has the process become available to the general public. It is the digitized audio rendering of computer te xt into speech.

TTS software can read text fro m a document, Web page or e-Book, generating synthesized speech through a computers speakers. It can be used for variety of applications such as Email reader, te xt reader etc. using text to speech system. Te xt to speech systems are increasingly becoming an essential co mponent of diffe rent type of computing systems. Also known as an artific ial voice synthesizer, a te xt to speech system can produce human voice artific ially based on a given string. Developing a text to speech system for a language that can support inputs in other languages can be helpful not only to the users know that language but are not familiar with its relative keyboard layout but also to international users that do not know that language at all and can hence type in that language using their local language keyboard layout.

Te xt To Speech Technology is a branch of Artificia l Intelligence. Te xt To Speech Synthesis is a voice/ speech technology in which raw te xt is converted into audible speech. Text To Speech (TTS) is a process through which input text is analyzed, processed, and understood and then the text is rendered as digital audio and then spoken. The TTS a re consisting of two ma jor components:

  • Natural Language Processing (NLP)

  • Dig ital Signal Processing (DSP).

    The process of TTS conversion allows the transformation of a string of phonetic and prosodic symbols into a synthetic speech signal. The quality of the result produced by a TTS synthesizer is a function of the quality of the string, as we ll as of the quality of the generation process. The most important qualities of a speech synthesis system are naturalness and intelligib ility. Naturalness describes how closely the output sounds like hu man speech, while intellig ibility is the ease with which the output is understood. The ideal speech synthesizer is both natural and intelligib le. Speech synthesis systems usually try to ma ximize both

    characteristics. The two primary technologies for generating synthetic speech waveforms are concatenative synthesis and formant synthesis. Each technology has strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used. The basic types of synthesis system the following are:

  • Fo rmant, Concatenated & Prerecorded

    Free & Ope n Source Software (FOSS) for IVR

    Open-source software (OSS) is computer software that is available in source code form: the source code and certain other rights norma lly reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software, and which includes a license allowing anyone to modify and redistribute the software. Source code is the actual instructions which programme rs write to create a piece of software, the "recipe" for the program. Once a program has been "compiled" into a form wh ich can be installed and run on a computer, its source code is irretrievable.

    It is practically impossible to make changes to a program without having a copy of its source code. If a program's license includes the right to modify the program, this right is meaningless unless the source code is readily available .

    Followi ng Tools/Applications re quire d to setting up an IVR syste m?

    S.

    No.

    Basic Requirement

    Optional

    Requirements

    1.

    Centos Linux operating

    system

    Apache web

    server

    2.

    Asterisk

    GUI Scripts

    3.

    Festival / eSpeak Text

    to Speech

    IPtables / PHP

    4.

    MySQL database

    Send Mail

    Linu x is a perfect operating system for Co mputer Telephony because:

    ESRSA Publication © 2012 http://www.ijert.org

    1. It is freely availab le and easily accessible. Just download from the internet your favorite distribution!

    2. It provides a reliab le server platform suitable for background processes usually associated with telephony.

    3. It is open source. You can reco mpile the Linu x kernel or optimize it to run on a small footprint….useful fo r e mbedded applications!

    4. It has exce llent support for secure re mote administration.

    5. It has amassed extraord inary mo mentum in the

software developer commun ity and enjoys greater hardware vendor support.

For these reasons many enterprise-scale telephony systems and large PBXs run some form of UNIX based operating system like Linu x.

Text To S peech (TTS)

ES pe ak : ESpeak is a co mpact open source software speech synthesizer for English and other languages, for Linu x. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a sma ll size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. It includes diffe rent Voices, whose characteristics can be altered and can produce speech output as a WAV file with SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTM L.

Festi val : Festival offers a general fra me work for building speech synthesis systems as well as inc luding e xa mples of various modules. As a whole it offers full text to speech through a number APIs: fro m shell leve l, though a Scheme command interpreter, as a C++ lib rary, fro m Java, and an Emacs interface. Festival is mu lti-lingual (currently English (Brit ish and American), and Spanish) though English is the most advanced. Other groups release new languages for the system.

DTMF DECODER

This circuit detects the dial tone from a telephone line and decodes the keypad pressed on the remote telephone. The dial tone we heard when we pick up the phone set is call Dual Tone Multi-Frequency, DTMF in short. The name was given because the tone that we heard over the phone is actually make up of two distinct frequency tone, hence the name dual tone. The DTMF tone is a form of one way communicat ion between the dialer and the telephone exchange. A

complete co mmunicat ion consists of the tone generator and the tone decoder.

As technology matures, pulse/dial tone method was inverted for telephony communication. It uses electronics and computer to assist in the phone line connection. Basically on the caller side, it is a dial tone generator. In DTM F there a re 16 distinct ton es. Each tone is the sum of two frequencies: one from a low and one fro m a h igh frequency group. There are four diffe rent frequencies in each group. A norma l telephone only uses 12 of the possible 16 tones. There are 4 ro ws and 4 colu mns. The ro ws and columns select frequencies from the low and high frequency group respectively. When a key is being pressed on the matrix keypad, it generates a unique tone consisting of two audible tone frequencies. For exa mple, if the key '1' is being press on the phone, the tone you hear is actually consisting of a 697hz & 1209h z sine signal. Pressing key '9' will generate the tone form by 852hz & 1477h z. The frequency use in the dial tone system is of audible range suitable for transmission over the telephone cable.

The e xact values of the frequencies are listed below:

FIGURE: DTMF Tab le of Frequency Co mbinations

The tone frequency associated with a particular key is deciphered as follows. Each key is specified by its row and column locations. For exa mp le the "2" key is row 0 and column 1. Thus using the above table, "2" has a frequency of 770 + 1336 = 2106 Hz The "9" is row 2 (R3) and colu mn 2 (C3) and has a frequency of 852 + 1477 = 2329 Hz. On the telephone e xchange side, it has a decoder circuit to decode the tone to digital code. For e xa mple , the tone of 941h z + 1336h z will be decoded as binary '1010' as the output. This digital output will be read in by a computer, wh ich will then act as a operator to connect the caller's telephone line to the designated phone line. The telephone exchange center will generate a high voltage signal to the receiving telephone, so as to ring the telephone bell, to notify the receiving user that there is an incoming call.

ESRSA Publication © 2012 http://www.ijert.org

Conclusion

Interactive Voice Response Systems are an interesting technology for supporting mobility in modern IT projects. But using this technology needs additional thoughts about its restrictions. Not every use case can be usefully e xpressed by voice dialogs. Voice processing and telephony technology is more expensive than traditional other technologies. But one main advantage remains: all you need is a phone. An IVR system is a powerful tool for increasing customer satisfaction, and it can help reduce the overall cost of a any office/call center etc while maintain ing or even increasing the number of incoming calls. As we have seen that developers/testers, and in particular those that are new to telephony domain, have experienced difficult ies while ma king IVR applications especially for above mentioned functionality of IVR applications. By using the right strategies and proven best practices as described using Open Source Tools in the paper, organizations can avoid difficu lties that can result in financia l losses and customer dissatisfaction.

Tentative Production Cost

By using Open Source software, telephony systems can be built for the price of a telephony card, a PC and a little e ffort. This can give your co mpany a tremendous cost advantage over traditional business models that charge large ly for their proprietary software. End-users can use this cost advantage to build and ma intain their own low cost, high quality telephony systems.

Component

Cost

PC

20000

CTI Card

12000

Operating System (Linu x)

Free

Application Softwa re

( Database, Diale r etc.)

Free

Total

32000

References

  1. Black et a l., 2001 Black A, Taylor P, Caley R (2001), "The Festival speech synthesis system:

    system documentation". University of Edinburgh [Online] Available : http://www.cstr.ed.ac. ukprojects/festival/.

  2. open source telephony project Asterisk: http://www.asterisk.org

  3. Chopra D. , " Gayatri A Fast Hindi Te xt To Speech System with Input Support For English Language, International Journal of Information Technology and Knowledge

    Management January-June 2011, Vo lu me 4, No. 1, pp. 139-141

  4. Dutoit, An Introduction to Text-to-Speech

    Synthesis, First edition, Kluwer Academic Publishers, 1996

  5. Ganapathiraju M., Bala krishnan M., Ba lakrishnan

    N., Reddy R., Om: One tool for many (Indian) languages, Journal of Zhejiang Un iversity Science, vol. 6A, no. 11, pp. 13481353, 2005.

  6. Google On line Hindi to English Transliteration Tool, [Online] Available : www. google.com/transliteration.

  7. Free TTS engine:

    http://freetts.sourceforge.net/docs/index.php.

  8. K. Ba li, A. G. Ra ma krishnan, P. P. Talu kdar, S.

    K. Ne me la, Tools for te development of a Hindi speech synthesis system, in 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, 2004.

  9. Ba lentine, B., and Morgan, D. P. (1999), Ho w to Build a Speech Recognition applicat ion. Enterprise Integration Group, 59, 186, 244.

  10. Mukhopadhyay, A. Chakraborty, S. Choudhury,

M. Lahiri, A. Dey, S. Basu, A., Sh ruti- an Embedded Te xt-to-speech System for Indian Languages Software, IEEE Proceedings, 153, Issue 2, April 2006, Page(s) 7579.

Anil Ku mar co mpleted his Master degrees as MCA & MBA and currently doing his P.hD fro m CMJ University, has done also professional certificat ion like MCSE, MCDBA, CCNA, RHCE(Tr.), presently

working as System Engineer at PDM Engineering College, Bahadur garh, (Ha ryana). He has 10 years e xperience as Technical Support in heterogeneous network and involved in mult iple research areas such as Open Source Imple mentations, Speech processing and various computing technology.

ESRSA Publication © 2012 http://www.ijert.org

Leave a Reply