Automated Teleprompter M.M.S.A. Paris, G.W.G.K.N. Udayanga, W.M.T.C.B. Weerakoon, K.W.C. Mihiran, C.R.D. Silva Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka. Abstract- This project is focused on automating the current manual teleprompting system. Teleprompting is one of the most widely used techniques in many broadcasting systems. Nowadays, the teleprompting process is handled manually. Our purpose is to automate the teleprompting scenario, so that no manual intervention is needed to operate the process.This project mainly consists of two sections; Teleprompter Automation and Graphical implementation for sign language to illustrate the speeches or conversion for the handicaps. KeyWords – Teleprompter, Sign language, Signal processing, MATLAB, C#. I. INTRODUCTION In public speaking it is very important to appear confident about the speech that you are going to deliver. People give more attention to speakers who show confidence in the field they are talking about. So it is necessary to keep your main point intact. However, when the duration of the speech increases, the amount of detail also increases making it hard to memorize the speech. Also it does not look a spontaneous speech due to presenter trying to remember the points of the speech which not give a good impression to the audience. Using a Teleprompter enables the speaker to be confident about the speech he is going to deliver even if he cannot remember the exact details of the speech. He can also maintain eye contact with the audience while looking at the script. The main reason for the emerging of the project‘s idea of Teleprompter Automation is to make the teleprompting process much easier for the people in the broadcasting world. In the present world, the teleprompting is not an easy task for a person or set of people to control manually in the real time as there might be situations where the process is done with severe mistakes by the operators as well as the hardness of the controlling. But if the process is automated the process becomes less prone to errors and also much easier for the administrators to handle with higher efficiency. The other section of Graphical implementation for sign language to illustrate the speeches or conversions for the handicaps is mainly due to the intention of facilitating the handicapped people. The proposed graphical implementation is supposed to be very efficient for many kinds of audiences who really need a proper support from the current technology. II. LITREATURE REVIEW A. Teleprompter A teleprompter is a display device that prompts the person speaking with an electronic visual text of a speech or script. The screen is in front of the performer, and the words on the screen are reflected to the eyes of the performer, normally using a sheet of clear glass. Teleprompter system mainly consists of a video monitor displaying the script and a clear glass to reflect it to presenter‘s eyes without refracting it to the audience. This is the usual setup for Teleprompters used in public speeches. But in the field of news reporting it is necessary for the news presenter to keep direct eye contact with the video camera. So a hardware setup as shown in Fig. 1 needs to be implemented to execute the teleprompting process without losing the eye contact with the camera. Figure 1:Schematic representation of a Teleprompter[1] (1) Video camera (2) Shroud (3) Video monitor (4) Clear glass or beam splitter (5) Image from subject (6) Image from video monitor B. Signal processing Signal processing involves the filtering of human voice range frequencies from the audio input of the presenter‘s microphone. As the input from the microphone contains frequencies out of human voice range, it is necessary to filter them before analyzing the voice activity. This step ensures that the accuracy of the calculations ahead becomes very high as the noise frequencies do not have any effect on the final result. MATLAB facilitates many built-in audio processing functions which are required for the project. Real time wave recording and Butterworth filtering can be given as examples for them. In addition to that, the capability of graphical representation of large data sets in subplot graphs in MATLAB was helpful. Wavrecord function is an in-built MATLAB function for recording sound using PC based audio input device [2]. It is useful in live capturing presenter‘s voice and storing it for sound processing. Wavrecord function captures the sound and stores it in matrix form in a preferred data type. A Band-Pass filter is a device that accepts frequencies within a certain range and rejects (attenuates) frequencies outside that range [3]. This is a combination of a Low- Pass filter and a High-Pass filter to extract only the human voice range of 200Hz – 7000Hz. ―Butter designs lowpass, bandpass, highpass, and bandstop digital and analog Butterworth filters. And Butterworth filters are characterized by a magnitude response that is maximally flat in the passband and monotonic overall‖ [4]. Butter function [5][6] can be used in highpass and lowpass filtering modes which is essential for removing non vocal sounds from the sound input. Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected [7]. C. Sign Language related study A sign language is a language which, instead of acoustically conveyed sound patterns, uses visually transmitted sign patterns (manual communication, body language) to convey meaning—simultaneously combining hand shapes, orientation and movement of the hands, arms or body, and facial expressions to fluidly express a speaker's thoughts [8][9]. Sign languages are on different standards as some of them represent a word with a sign and others represent a letter in a sign. For a real-time system like ours it is efficient to use the sign per word representation. In most cases, there is a one to one map between words and signs. In some occasions there are signs which represent number of words. III. DESIGN AND IMPLEMENTATION The basic research for the automated teleprompter project was done using matlab.With the aid of matlab we researched about voice to text mapping which is the core requirement in scrolling the teleprompter screen.The other side of the research was carried out for the avatar implementation.For that we have to graphically illustrate the words in sinhala language, since the project was basically focused on that language.Finally we want to convert all the MatLab code to C# .Through the C# we linked the voice to text mapping part and the text to sign language video library.Fig. 2 shows the basic functionalities carried out in this project. Following subsections describes the methodologies that we involved in our project. A. MATLAB prototype To understand the primitive requirements of accurate voice recognition, for word detection we decided to implement a prototype based on ―real time word detection‖. Since this prototype development had a considerable research area, we used MATLAB language which is optimized for engineering research based simulations rather than usual development base programming languages. Basically code for the prototype was encapsulated within two m-files called Simulate.m and Splitter.m. Simulate m-file contains the functionality for voice recording. Since 8000Hz discrete voice capturing (8000 samples per second) is sufficient to recognize the continuous human speech, we did the recording in ‗mono‘ mod at this frequency. Then the recorded voice data for each one second is band pass filtered so as to preserve the 200-7000Hz frequency range which collects the human voice frequencies. As the band pass filter we use the Butterworth filter of MATLAB. The filtered data is collected by the one dimension matrix. When obtaining discrete voice samples there are some zero amplitude voice samples between the letters of the word itself. So we need to consider something like correlation among voice samples since it rebuilds the continuous nature of the human speech. And also many differences/variations in human voice can be captured with voice energy, by obtaining the energy of each sample collectively with samples around the required samples [10][11]. After obtaining the energy variation of samples we can apply our magnitude threshold to obtain a step function showing a binary output for speaking and pausing times. We input Figure2: A high level representation for Basic functionalities for automated teleprompter this to splitter function which contains the logic for determining the existence of a word. Splitter m-file is basically a function which accepts the stepFn and many other variables from the simulate function and pump those to logical pipes that we use for word detection. The return values of this function contain the desired output (the wordFn) and many other synchronization and chaining variables among the 1 second processing periods. By nature, when a human is speaking there are silent gaps/pausing intervals between letters, words as well as between sentences in increasing order. We can detect a start of a word with appropriate threshold values for pausing time between words. But there are some silent intervals which are same size or larger than the silent threshold within a speaking word itself because of the syllables and so on. So we need to consider a rough amount of time that has to be spent to speak a particular word. If we can have a rough value for a minimum time (in samples) to speak a letter and if the word length is given we can calculate a rough time that will spend to speak that word at the fastest possible speaking. Obtaining a value for the minimum threshold for the time spent to speak a letter is very important. Since we have the script we can track the expected word and its length. In our logic, if we detect a starting of a word we assign the silent length it has before speaking, to a temporary variable. Then we check whether the speaker speaks more than 100 samples. In detecting a word we check whether the speaktime is greater than the expected word length. If it is greater than the expected word length a word is detected. Fig. 3 shows the implementation of Matlab prototype. B. Live voice capturing and real time voice processing For real time audio recording purpose, we use the methods of the ―NAudio‖ audio library which is for C#. For each second we take 8000 samples of live voice. We send the voice samples for processing at each 1s to the filter. C. Separation of human audible frequency range and noise filtering using sound processing module To separate the human voice we use the Butterworth filter [4]. With that we obtained the frequency components between 200-7000Hz which cover the frequencies of human voice. To eliminate the noise within this frequency range we calculate the energy [10][11] of each sample and use a threshold for the magnitudes of the spectrum. And to make the analyzing process easier, we develop a step function which makes the magnitude of voice components to zero while real speaking sound components to one. To compensate for both of these values correctly we use a magnitude threshold. D. words counting to script scrolling module After using the magnitude threshold we get a function with voice active regions and inactive regions for each second. Then we execute wordSplit function which contains the algorithm developed in MATLAB prototype. If the voice inactive region exceeds a threshold value, (obtained by research) the next voice active region is considered as a word and the word counter is incremented. These three steps are shown in Fig. 4. Script scrolling Module handles importing of the Script as a .TXT file to the program. The .TXT file should be in Unicode encoding, as the language which the program developed is for Sinhala. So it only supports a Script which is encoded in Unicode and in Sinhala Unicode range. First Script will be passed through filters to remove unnecessary characters from the Script, and to format the Script accordingly. Then the Script should be split to Figure3: the implementation of Matlab prototype Figure4: 1.Energy variation of speech signal 2. Step function 3. Word positions Sentences and Words for the ease of displaying the flow of words in the Teleprompter Interface. This task is also carried out by the Script Controlling Module. Here the Sentences and Words from the Script is split accordingly and are indexed into array to allow easy access of a Word from a certain Sentence. E. Sign language avatar and Avatar controller module Sign languages are on different standards as some of them represent a word with a sign and others represent a letter in a sign. For a real-time system like ours it is efficient to use the sign per word representation. In most cases there is a one to one map between words and signs. In some occasions there are signs which represent number of words [8]. Sign language movements were simulated using a sign language video library. The Avatar controller module is mainly controlled according to the Script controller module of the system. The Avatar controller module behaves according to the Script controller module in a way such that the position of avatar animation is derived from the script position of the given text. Main functionality of avatar controller module is to get the current script position from the Script Controller Module and then check whether that detected Sinhala Unicode word exists in the property file and call it from the mapped position of the video database if it exists. Then send that video to the Avatar Window to play in order to show the sign language posture to the viewer. IV. RESULTS There are many software [12][13][14] available for the teleprompting scenario in the literature. But all of them use a manual timer and a separate human controller to scroll the script according to the presenter‘s voice. Here main disadvantage is that it can run faster or slower than the presenters speed. Our system was intended to automate this scrolling and reduce the out of sync situation. It is the only system that uses presenter‘s voice input to track the position of the script. Using presenter‘s voice provides automation and many other possibilities. With the script in hand we tried to get as much as information from the script to do our automation. This also allowed us to implement the Sign language avatar synchronized with the presenter‘s voice. Also real time prompts messages for director to communicate with presenter. Fig. 5 shows the interface of Automated Teleprompter. The Matlab prototyping of the system brings some promising results with some restrictions. It performs the word splitting process accurately within some constrains. The accuracy of the system is dependent on the speaker‘s voice characteristics. Tampering with constants in the prototype, the system could achieve higher accuracy in word splitting process. First parameter we have to estimate is the time of letters which specifies the minimum time needed to speak a single letter. From our test simulation we found this value to be approximately 250 samples with respect to our sampling rate. Also the silent threshold to split the words was calculated to be 430 samples with respect to our sampling rate. These values were calculated for different speakers with different speed to get a good statistical average. Figure 5: GUI module of Automated Teleprompter Changing the word detection magnitude threshold and voice inactive interval (silent threshold) are the main two parameters that give some impressive results. When speaker is speaking somewhat slow rate, the word detection process get executed very successfully and captured all of the words. In case a word is not properly detected there seem to be a lag of 2-3 words and controls are provided to maintain the synchronization. Sign language avatar (Fig. 6) is a very innovative idea. Currently there is no sign language video library available for Sinhala language. Most of the time it is done manually and it has a considerable amount of lagging behind the voice. Our attempt was to create a Sinhala sign language video library using available video clips and use that library to present the speech in sign language. This is implemented using a property file for more extendibility in words or convert into other languages. The result of the sign language avatar was a success due to the iterative development we undertook and frame rate. Figure 6: Sign language Avatar V. CONCLUSION Automated Teleprompter surpassed our expectations and achieved lot more achievements in multiple fields. Although the accuracy of the system is not 100%, we were able to achieve very high standards for a semi- automated one. One thing we learnt from our research was that it is very hard to achieve full automation without performing a training session and analyzing presenter‘s voice which exerts a huge overhead for speech recognition. So we can safely say that an optimal output was maintained by our method by allowing presenters to use the system without any initial training. VI. FUTURE WORK Although we have provided a whole lot of functionality as we have declared before, there are still more that could be done to further develop the Automated Teleprompter project. So far we have covered almost all the basic requirements needed to automate the teleprompting and also the avatar animation process, mapped to the given script. But if we think a bit out of the scope and try to make it even better by adding some other advanced requirements, it would be really good for its improvement in the future, although it is a complete product to be used at this stage. There are quite a few areas that we can extend our scope and the following areas represent some such scenarios. At this stage the system only supports for the Sinhala Unicode. The system is developed to support for the Sinhala scripts to be read by speakers. The system can be further developed to support for other language as well using relevant Unicode systems. Signal processing is one of the most important areas in this project since the automation of the teleprompter thoroughly depends on how the input audio signal is processed. The existing accuracy rate of the signal processing mechanism is approximately 85% after going through a lot of filtering and other mechanisms. But anyway if this process can be further developed to increase the accuracy rate it will vastly enhance automation quality of the project. Avatar animation of the sign language for the disable people is also another very important area in this project. The avatar controller unit depends on the script controller for it to get the current positions of the script to enable real time position videos. But the checking, calling and processing for videos are totally handled by the avatar controller unit. If the existing implementation can be further fine-tuned to optimize the checking, calling and processing scenarios then it can upgrade the performance of the system and memory management. VII. ACKNOWLEDGMENT Authors of this paper would like to thank Dr. Shantha Fernando for coordinating this module for us. Special thanks go to all the staff and the support staff members of Department of Computer Science and Engineering, University of Moratuwa and all our fellow batch mates for their support. REFERENCES [1] Teleprompter [Online] [Accessed: May 06, 2011] http://en.wikipedia.org/wiki/Teleprompter [2] wavrecord [Online] [Accessed: May 20, 2011] http://www.mathworks.com/help/techdoc/ref/wavrecord.html [3] Band-Pass filter [Online] [accessed: May 06, 2011] http://www.electronics-tutorials.ws/filter/filter_4.html [4] Butterwort filter design [Online] [Accessed: May 20, 2011] http://www.mathworks.com/help/toolbox/signal/butter.html [5] Wireless Engineer (also called Experimental Wireless and the Wireless Engineer), vol. 7, 1930, pp. 536–541 [6] Butterworth, S. On the Theory of Filter Amplifiers www2.ee.ufpe.br/codec/paper%20BUTTERWORTH.pdf [7] Ramírez, J,J. M. Górriz, J. C. Segura. Voice Activity Detection. Fundamentals and Speech Recognition System Robustness. 2007. [8] Sign language [Online] [Accessed: May 21, 2011] http://en.wikipedia.org/wiki/Sign_language [9] Rohana Special School [Online] [Accessed: May 18, 2011] http://www.rohanaspecialschool.org/ [10] Energy (signal processing) [Online] [Accessed: July 03, 2011] http://en.wikipedia.org/wiki/Energy_%28signal_processing%29 [11] Michael Peter Norton and Denis G. Karczub. Fundamentals of Noise and Vibration Analysis for Engineers. 2003. Cambridge University Press. [12] Free Teleprompter [Online] [Accessed: Oct 03, 2010] http://www.freetelepromptersoftware.com/ [13] PropmtDog [Online] [Accessed: Oct 03, 2010] http://www.promptdog.com/ [14] SignGenius [Online] [Accessed: Oct 03, 2010] http://www.signgenius.com/signlanguage-american.shtml