REAL TIME DATA TRANSMISSION OVER GSM VOICE CHANNEL FOR SECURE VOICE & DATA APPLICATIONS N.N. Katugampala, K.T.Al-Naimi, S. Villette, and A.M. Kondoz University of Surrey, United Kingdom Email:
[email protected] addition, the GSM data channel may use automatic repeat request (ARQ) for error correction and has zero errors at the expense of increased delay. The average round-trip time of the GSM data channel is 0.5 seconds [Challans et al 3]. This value depends upon the size of the packets transmitted. In practice this translates into a delay, which exceeds the ITU-T specifications for one-way transmission times of 150ms for telephony services [ITU-T G.114 4]. Although the proposed 3GPP standards specify the provision of low-latency data bearer channels, which could be used for end-to-end secure communications or telemetry operations, the deployment dates of such systems are as yet uncertain, and it will be quite some time before 3G mobile systems will be ubiquitously available. On the other hand, the use of encryption on the speech channel is not straightforward. The GSM terminal has a speech compression/decompression process for efficient use of the bandwidth and this is heavily based on the assumption that the input signal will be speech. It uses the usual speech production model parameters such as pitch, vocal tract model parameters etc. to efficiently compress the input speech. If the speech signal is encrypted before it comes to the encoding block, as it will be randomised by the encryption process, it will not satisfy the expected speech characteristics and hence will fail to go through the GSM speech transcoding process with sufficient accuracy. A method was presented where after the encryption process the resultant bits are modulated back onto speech-like waveforms, which possess the required speech characteristics [Katugampala et al 5]. This paper presents the progress made since the publication [5], in terms of testing on public GSM network voice calls, additional problems encountered, proposed solutions, and a real time prototype of the system. ABSTRACT This paper describes a real time prototype implementation of a system, which enables secure voice and data communication over the GSM voice channel. The security of GSM is not guaranteed especially over the core network. The proposed system modulates digital data, which may be encrypted onto speech-like waveforms. The modulated waveform is then transmitted over the GSM voice channel, which can be demodulated and decrypted at the receiver. The real time prototype system has been tested on GSM-to-GSM voice calls, and a proprietary speech codec is used on the real time data channel to produce communication quality speech. A demonstration will be provided at the presentation. 1. INTRODUCTION The GSM system ensures subscriber identity confidentiality, subscriber authentication as well as confidentiality of user traffic and signalling. The ciphering algorithms used in GSM [Lo and Chen 1] have proved to be effective in ensuring traffic confidentiality. However the traffic confidentiality is only ensured across the radio access channel. Voice traffic is transmitted across the core circuit switched networks ‘in clear’ in the form of PCM or ADPCM speech which opens up the possibility of unauthorised access to GSM-to-GSM or GSM-to-PSTN conversations. Moreover, the security over the GSM speech channel is controlled by the network operator, not the end user. Control by the end user may be preferable in some applications. For guaranteed end-to-end security the speech signal must be encrypted before entering the communications system. Although the GSM data channel can be used for encrypted speech transmission, this approach suffers from a number of disadvantages. The GSM data channel has interoperability problems especially across the international networks [Street 2]. The GSM data channel typically requires 28-31 seconds to establish a connection, of which approximately 18 seconds are taken up by the GSM modem handshaking time. In 2. VOICE DATA TUNNELLING The standard modems used in PSTN are not suitable for the compressed low bit rate speech channels. The main objective of speech compression is to reduce the number of bits required to represent speech, whilst still retaining an acceptable speech quality level [Kondoz 6]. A side-effect of this approach is that the resulting synthesised speech, whilst perceptually being similar to it sounds very similar to the original. e.g. a speech decoder at the base station. The example shown is a typical mobile terminal to mobile terminal communications path. For simplicity only simplex communication is illustrated.e.2 kbps. The output bit stream of the speech encoder can be encrypted. Figure 1 depicts the relationship between the modulator. mobile communication systems. The simulation included a modulator. 1.g. e. The resulting digital bit stream is transmitted over the communications channel. This modem can be used to transmit any form of general digital data. in addition to the double tandem speech transcoding. The GSM speech encoder in the handset compresses the modulated waveform. Therefore a different modem was designed for low bit rate speech channels [5]. which converts it into a speech-like waveform to feed into the GSM handset. The input speech signal is first compressed using a very low bit rate speech encoder [Stefanovic et al 7]. a core transmission network. SIMULATIONS The results presented in [5] were obtained from software simulations using C. a speech encoder at the second base station and a downlink radio channel. encrypted speech. and the transmission path in a low bit rate voice communication system.the input speech. This objective difference prevents most data modems from operating over channels. and a demodulator. The encrypted speech data is fed into the modulator. The bit stream is received by the decoder of the receive terminal which converts it back to a speech-like . which employ speech compression systems. a double tandem GSM EFR [ETSI GSM 06. These components further degrade the performance. i. may have a fairly different waveform on a sample-by-sample basis. This problem is compounded by the fact that in many networks. The demodulator is still able to extract the original transmitted data.60 8] speech transcoding process. and in particular. Figure 3 depicts a modulated waveform segment and the signal received at the demodulator in the simulations. The modulated signal is monitored for sections without much variation and modified so that triggering of the VAD will be avoided. Speech-like Input Speech waveform 1100110 Input speech Speech encoder 1010101 Data Data encryption modulator Add-on module to be connected to standard GSM handset Speech encoder PSTN to GSM 64 kbps PCM waveform GSM to PSTN Speech decoder Base Station Subsystem Speech-like waveform 1010101 Base Station Subsystem Speech-like waveform Output Speech Compressed ‘speech’ Data modulator Input data Transmitter Speech compression 1010101 Data demodulator 1100110 Speech decoder Output speech Data decryption Add-on module to be connected to standard GSM handset Communications Network Data may be subjected to bit errors and packet/block loss Speech-like waveform Data demodulator Speech decompression Compressed ‘speech’ Figure 2: Overview of the complete system 3. which includes a radio link. The GSM system includes additional components such as Automatic Gain Control (AGC). and various filters. a phenomenon known as tandeming. However the real GSM voice channel proved to be more challenging even under error free conditions. however full duplex secure voice communication is possible using the same techniques. The initial modem [5] needed significant modifications in order to take into account the above listed problems. 1010101 Output data Receiver Figure 1: Modulation over the speech channel of a communications network Figure 2 depicts a more detailed example for the GSM system. Voice Activity Detectors (VAD). the speech signal may undergo more than one set of compression/decompression stages. cause the waveform generated by the decoder to differ from that produced by the modulator at the transmit end. waveform. in order to accommodate in the available bandwidth of the voice data tunnel. The transcoding that takes place within the network. the demodulator. The secure voice system is implemented as a separate add on module and the interface provided with cables using the hands free sockets of the GSM handsets. It should be noted that Bluetooth provides a digital connection. 200 to 400 MHz. In order to achieve full duplex secure communication practical issues such as side tone cancellation and two to four wire transformation need to be considered depending on the interface and the telephone connection used. The system works better on GSM-to-PSTN. An integrated PDA implementation. 4. while the hands free cables provide an analogue connection. The modem. namely Vodafone and O2.g. The interface to the GSM handsets was provided using hands free cables. This synchronisation signal is derived from a known set of data stored at both the modulator and the demodulator and transmitted at the beginning. due to one or no speech transcoding stages involved. . in addition to the improved security. 750 Amplitude 0 250 500 Time (samples. This problem was solved by implementing an additional function to continuously monitor the frame boundaries. Creative Sound Blaster Audigy The extra end-to-end delay introduced by the system stays reasonable: 95ms for the algorithmic delay of the 1. This is significantly less than that of the GSM data channel delay. As a result the proposed system provides a better quality of service than the existing systems.2 kb/s speech coder plus 40 ms for the modulation/demodulation process give an overall extra delay of 135 ms in addition to the normal GSM speech channel delay. Each PC used one 2 GHz Intel Pentium Xeon processor running Microsoft Windows XP operating system and 2 GB of RAM. 2. REAL TIME PROTOTYPE There are several methods to interface the service access point of the communications network.2 End-to-end delay Figure 3: Synthesised and received speech-like waveforms 4. An additional problem with the analogue connections is the drifting of the digital samples of the modulated signal due to the difference in the transmitter and receiver sound card clock rates. The secure voice system is implemented as a separate add on module and the interface provided with a Bluetooth audio link. Analogue connections add extra distortion and perform worse than the digital connections. 1. Microsoft Visual C++ library functions were used to read and write to the sound cards. Then the modulated waveform could be directly copied onto the GSM voice buffer.g.1 Synchronisation Synchronisation of the frame boundaries is achieved by using a different modulated signal with a much lower data rate (400 bps). GSM handset to the modem. PSTN-to-GSM.Synthesised signal Received signal 2 soundcards and various standard Nokia handsets were used. RESULTS ON GSM-TO-GSM VOICE CALLS Table 1 shows the results obtained on GSM-to-GSM cross network voice calls on UK public networks. will not add any distortion due to the interface. GSM-to-GSM calls undergo double tandem speech transcoding. This is the most challenging scenario for the proposed system. The present demonstrator is a simplex system. e. Once the complexity reduction techniques currently being investigated are implemented the complete full duplex secure voice system is expected to run on a modern PDA e. 8 kHz) 4. or PSTN-to-PSTN connections. encryption/decryption. A real time prototype of the system was implemented on two desktop personal computers (PC). This signal passes through a GSM voice call with virtually no errors. 5. which directly accesses the GSM voice buffers. and the speech codec may be implemented on a personal digital assistant (PDA) with a GSM connection. 3. which can be extended to a full duplex system. 40 0. “A 2.03 FER % 1. A 1/2 rate convolutional code with a constraint length of 7 is used to derive the 1. 6. [2] Michael Street. however the modulated signal was transferred to the handset as Bluetooth data. and H. Thorlby. The same code is used with puncturing to derive the 1. With the addition of error correcting codes a throughput of 1. Proceedings of the IEEE Speech Coding Workshop 2002. The secure voice system.2 kbps proprietary speech codec.7 1. Katugampala.9 % bit error rate (BER). Finland. and A.2 the system on a GSM-to-GSM call with the 1. D. Kondoz. Enhanced Full Rate (EFR) speech transcoding”. May 2000. M. S. September 2000. 1994. 45. Villette. No. Wiley. which enables end-to-end secure voice communications over the GSM voice channel.4/1. The speech codec can tolerate these error rates without noticeably degrading the output speech quality. Kondoz. may encrypt the resulting bit stream to provide security. Japan.2 kbps proprietary speech codec [7][ Villette et al 9] is used providing communication quality speech across the secure voice channel. “Digital speech: coding for low bit rate communication systems”. Villette. New York. Tsukuba. and speech pattern modulates to pass through the GSM speech transcoding process. The secure voice system has been tested on GSM-to-GSM voice calls. IEE Secure GSM and Beyond: End to End Security for Mobile Communications. and J.0 BER % 2.03 % BER and 0. C. Al-Naimi.2 kbps rate. CONCLUSION [3] P. 7. . R. Interface In order to avoid the potential problems associated with analogue interfacing at the transmitter side a digital interface was simulated by copying a modulated waveform file onto a GSM handset using Bluetooth. It is shown that end-to-end secure communication over the GSM voice channel is achievable. “A 2. 1074-1079.2 kbps with 0. Lo and Y. A modern 1. Tampere. is used to produce communication quality speech. [8] ETSI Standard GSM 06.2 kb/s speech coder with noise pre-processor”. which can tolerate these error rates. S.9 2. Villette. pp. IEE Secure GSM and Beyond: End to End Security for Mobile Communications. Palaz.2 % frame error rate (FER) has been derived.0 3.2 kbps speech codec will be provided at the presentation.114. A throughput of 3 kbps has been achieved with 2. London. proceedings EUSIPCO 2000. A demonstration of [7] M. “Secure communication mechanisms for GSM networks”. Y. Kondoz. and playing the file while on a call to a second GSM handset. compresses the speech to reduce the bit rate. [5] N. February 2003. London. A real time prototype system has been implemented. pp.TABLE 1: Results on GSM-to-GSM voice calls Before channel decoding Rate kbps Digital/Analogue Digital/Analogue 3. [9] S.60. REFERENCES [1] C.4/1. February 2003. Cho. Kondoz. Stefanovic. February 2003. [4] ITU-T Recommendation G.9 After channel decoding Rate kbps 1. 4. “Digital cellular telecommunications system. A modern 1.7 kbps rate. London. Gover. 4-8. March 1997. Sturt. P. IEEE Transactions on Consumer Electronics. “Secure voice over GSM and other low bit rate systems”. IEE Secure GSM and Beyond: End to End Security for Mobile Communications. and A. J.8 0. K. [6] A. Challans. “One-way transmission time”. November 1999. “Interoperability and international operation: An introduction to end to end mobile security”. The second handset was connected to a PC via hands free cables. M. October 2002. A.2 SB-LPC based speech coder: the Turkish NATO STANAG candidate”. “End to end data bearer performance characterisation for communications over wide area mobile networks”.2 BER % 0. which analyses the received signal and plays the speech in real time. M. This process transmits the modulated signal on a GSM-to-GSM call. Vol. Chen.