IMPLEMENTATION OF QIM BASED AUDIO WATERMARKING USING HYBRID TRANSFORM OF SWT-DCT-SVD METHODS OPTIMIZED WITH GENETIC ALORITHM

Nowadays, almost all data transaction is done through the internet because it is easy and could be accessed anywhere. The file is uploaded directly without any security or scanning making people able to upload illegal files or files that is not owned by them. This violation of copyright becomes a huge problem as it reduces the owner’s profit. That is why watermarking method is created. Watermarking is a method of embedding secret information to a host data. The information could be embedded into an audio, image or video data. This research will design an audio watermarking by combining 3 methods of transformation: Stationary Wavelet Transform (SWT), Discrete Cosine Transform (DCT), and Singular Value Decomposition (SVD). SWT separates data’s frequency into high and low. After that, DCT maps correlated high frequency data into uncorrelated coefficient. Then those coefficients will be deconstructed into three matrices u, s, and v using SVD method. Later, the s matric will be embedded with the watermark. With these methods, watermarked audio produce SNR>20dB and BER≈0,1170 in average.


Introduction
Now information could be easily obtained and uploaded in the internet. As internet is a complex links of computers around the world, security and surveillance of all its content is very complicated. Many irresponsible people exploit this weakness to share data that is not theirs to gain profit. Piracy is usually done to multimedia files (audio, image, and video) and causes loss to the original owner of that file. That is why watermarking methods is researched.
Watermarking is a technique in signal processing where a host data is embedded with an information or logo that works as an identity of the data owner to protect the particular data from piracy [2]. There are some qualities which a watermarked audio should have [1,2]: 1) Inaudible. Watermark should not affect audio quality with signal-to-noise ratio (SNR) value not more than 20 dB; 2) Robust. Information that is embedded should last against attacks and manipulation that is done by pirates; 3) Secure. Watermark data that is embedded should not able to be extracted except by the maker.
Transformation domain has performance benefit compared to watermarks in time domain because transform domain could better exploit human auditory system. In this paper three methods will combined, Singular Wavelet Transform (SWT) [3], Discrete Cosine Transform (DCT) [4,5], and Singular Value Decomposition (SVD) to produce watermarked audio that is inaudible, and robust against attacks like noise adding, filtering, MP3 compression, time and frequency modifications.
Many researchers develop hybrid algorithm to seek better performance. Chen and Zhu proposed a scheme that combines DWT and DCT [6]. Two characteristics of the transform were combined; DWT multi resolution and DCT energy compression. The embedding is done through zerowatermarking technique which keeps the watermark in a secret key and not the signal. Another research of the same DWT-DCT hybrid algorithm also proposed to prevent damage from synchronization attack by inserting the watermark into high power low frequency component with adaptive quantization [7].
This research combines the three methods because of this following reasons; SWT is used to split data into two types high and low. The SWT process is used to protect the watermark from filtering attack. DCT is used to convert data into frequency domain and also compress the data meaning that it could reduce the BER value produced. Then the obtained DCT coefficients will be structured into a symmetrical matrix to be processed using SVD to produce U, S, and V matrices which also will reduce the BER value produced. The S matrix then will be used for embedding using QIM method.
Optimization with genetic algorithm is used in this method since each processing technique require parameter inputs such as length of frame, threshold, number of bits used, and etc. genetic algorithm works to do combination trial of those parameters to see which combination produce the best result. The paper is structured as follows. The first section consists of introduction; in this section the general explanation of the paper is presented. Section two will tell you about the theoretical bases of the proposed watermarking scheme. The third section will talk about design and its step by step processes. The fourth section will talk about result and analysis while the last section concludes the paper.

Stationary Wavelet Transform
Wavelets Transformation is used to decompose data into two parts; high frequency data and low frequency data. The number of decomposition in this process is usually decided by usage and the length of original data. The data that is created from this method is called SWT coefficient. Original file could be reconstructed from those coefficient trough inverse SWT process.
SWT is a simplification of DWT in which there is no down sampling process in SWT. SWT signal is formed through a calculation by passing through x signal into several filters. The calculation of approximation coefficient ) and detail coefficient could be calculated with equation 1 and 2 [9]: Signal is also decomposed simultaneously by using high pass filter h. Result from both filtrations will produce detail coefficient (signal from high pass filter) and approximation coefficient (signal from low pass filter).
SWT could be applied in a noise removal application in a signal, pattern scanner, brain image classification, and brain disease detection.

Discrete Cosine Transform
As in other transformations, Discrete Cosine Transform (DCT) aims to de-correlate audio data. Every transform coefficient could be encoded independently without losing compression efficiency. DCT transform that will be used in this journal is one dimensional DCT defined as equation 5 [7].
Where y(k) is the one dimensional DCT coefficient, x(n) is the original file audio, N is the length of file audio and k equals to 0,1,2,…,N-1. Matching with its transformation equation, the inverse equation (IDCT) shown on 6: Where n = 0,1,2,…,N-1. In both formulas, w (k) is defined as equation 7: , k = 0 , As such, the first transform coefficient is the sample sequence mean known as DC coefficient. DCT normally used in signal or image processing, especially lossy compression because DCT has a strong energy compression characteristic.

Singular Value Decomposition
SVD is a factorization process of a real or complex matrix. Is a generalization of Eigen decomposition of asymmetrical matrices (different length and width). Say A is a matrix with n x n size. Through SVD process this matrix can be decomposed following equation 8 [14]: Where U & V matrices are an n x n matrix which is orthogonal and matrix is the diagonal elements of D with positive real numbers. The nonzero value of the matrix is called singular values of matrix A. Equation 2.8 is devised from the fact that

A T
A is symmetrical. Thus its eigenvector form orthonormal basis. Take xi eigenvector and λi as its eigenvalue. Then take σi = i  and ri = i x i A  Those variables the following matrices could be built; diagonal matrix S, with σi as its diagonal value, U matrix with ri as its columns, and V matrix with xi as its columns. If we multiply U with S, thus equation 2.9 will be created: Then if we multiply US with V T would mean multiplying Axi with rows of xj. Then matrix consisting of Axixj T will be created. Due to symmetrical characteristic of the matrix xi and xj they will form an orthonormal basis. Meaning that eigenvector that is different will be orthogonal to each other. Then xixj T = 0 if (i ≠ j) and xixi T = 1 on the same condition. That is why xixj T will form identity matrix, because diagonal multiplication will produce 1 (i = j) while on other position zero will be emulated (i ≠ j) which means USV T = AI = A.

Quantization Index Modulation
QIM is a watermark embedding method by using two or more quantizer, where each quantizer has its own index [6]. In designing, function range should be designed so each function reach not overlaps each other. This is done so in extraction process, m value could be defined uniquely. To reach the desired result, function range should be discontinued following the quantizer characteristic. 10 are the formula for QIM Embedding [9]: While the extraction formula is stated in 12 [9]: Where is an estimate of information in the received signals.

Genetic Algorithm
Genetic algorithm is adapted from evolution mechanism and natural genetics [10] and works to find the best, most optimum parameter. Parameter finding mechanism is done using selection, crossover and population mutation process so a solution called as chromosome could be produced.
Chromosome building components is called as gen, where gen could be in a form of character, symbol, binary or numeric number following the problem wished to be solved. Those chromosomes will be continually produces with different genes to produce the best output; this evolution is done continuously in a generation. Chromosomes with a good success rate will be further evolved in the next generation. Chromosome success rate is measure by fitness function (FF) parameter, so chromosome with high FF value will be chosen in the next generation [11].
In a generation, chromosomes produced by doing crossover between chromosomes. The number of chromosomes undergoing crossover depends on the crossover probability inputted. Besides that, chromosomes could also be produced through mutation or the alteration of one or more gen value randomly (not through crossover). Mutation probability that is inputted decides how many mutations will happen. In the end chromosomes with convergent value will produce the best solution towards the problem [11].

Design
In this paper there will be three designs of audio watermarking algorithm using text shaped image watermark data. Those algorithms will have the same transformation methods which are SWT-DCT-SVD but will have a different framing process. The designs, which will be called A, B, and C design defined as follows: Design A: outside SWT and DCT + SVD + QIM Design B: outside SWTinside DCT + SVD + QIM Design C: inside SWT dan DCT + SVD + QIM Where outside means that framing will be done after SWT/DCT process. While inside means that framing is done before SWT/DCT process. Part 3.1 will explain general embedding and extracting process of the designed system regardless of when the framing process is done.

Embedding Process
Embedding process is a process of inserting watermark into audio host. Insertion done through some steps shown in figure 1.
Step 1: Before embedding, watermark image is first converted into W vector in a size of m x n. Then original audio file is sampled by 44100 samples per second sampling rate. Then sampled data is framed into many. The sum of all frames equal all audio signals sampled with equation 13: A = ∑Ai ; 1 < i < N (13) Step 2: Do SWT transformation on every Ai. This operation will create sub-bands with same size: Ds and A (s in Ds represent the number of sub-bands created). Where Ds represent detail sub-bands and A represent approximation sub-bands (see figure 2).
Step 3: Then transform those sub-bands using DCT to produce Dx and Ax, which are DCT transformed Ds and A. Those sub-bands will be constructed into a DC matrix with s x (L/2) as its size, where L is the length of every frame and s is the number of sub-bands created by SWT. If the number of sub-bands created amounted to four, the DC matrix will look like figure 3.
Step 4: Then, decompose DC using SVD operation (see equation 3.2). This operation will create three orthonormal matrices u, s, and v which are factorizations of DC matrix.

DC = U x S x V T (14)
Where s is diagonal 4 x 4 matrix with non-zero singular value in its diagonal. Those values will be used for embedding.
Step 5: Then S matrix will be embedded using QIM by modulating its value with m index. The value of m could ranges from 1-10 depends on the system input.

Extraction Process
Extraction process is a process of retrieving watermark data from watermarked audio. Extraction process is the inverse of embedding process and could be seen in figure 4: Step 1: Take watermarked audio and sample it with sampling rate of 44100 samples per second to avoid aliasing. Then frame sampled data into many, in which the sum of all framed data equals to all audio signal.
Step 2: Perform SWT on every frame and produce approximation and detail coefficients. Take the approximation coefficient and process it with DCT to produce wavelets.
Step 3: The created wavelets will be in a form of sub bands in regards to SWT output. Make a matrix out of those sub bands, a similar one with embedding process. Then factorize the created matrix using SVD to produce U, S, and V matrix.
Step 4: Take the S matrix and extract it using QIM extraction equation and the watermark data will be produced.

Genetic Algorithm Process
In this step, watermarking parameters will be optimized to produce optimum output. Generally optimizing process with genetic algorithm could be seen in figure 5: Step 1: Initialize the parameter of the algorithm. Define the number of generations, individuals, mutation and also crossover probability that is desired.
Step 2: Initialize the parameter to be optimized; in this paper those parameters are level decomposition of SWT, QIM quantization number, size of frame in framing process, threshold of audio power that is allowed, and audio quantization bit.
Step 3: After initializations start the algorithm. The algorithm then will start producing chromosomes (solutions) with the parameter to be optimized as its genes. The calculation of fitness function will be done through embedding, attacking and extracting processes. As explained on theoretical bases, chromosomes with good fitness function will be selected into the next generation and be used to produce new chromosomes trough crossover and mutation processes.
Step 4: When the generation desired or fitness function has reached the value of 1, the algorithm will stop producing the chromosomes and the optimized parameters will be produced.

Early System Test and Analysis
In early system test, trial by changing parameter combination will be conducted. This experiment will conclude an analysis of how N (decomposition level), Nframe (frame number), Nbit (number of QIM bit), thr (threshold), and bit (quantization depth) affect the watermarked audio. Experiments are held by changing the least significant parameters towards more significant parameters. Table 1 shows the initial parameters Table 2 shows the result of early parameter test:  From attack test result it could be concluded that design B and C is strong against LPF, BPF, noise adding and resampling attacks shown from BER = 0 value that is constant even after attack but both design is weak against time and frequency modification and compression attack.

Final System Test and Analysis
Early system test has shown designs performance before and after attack. In final system test and analysis, all design will be . optimized with genetic algorithm to find the best parameters combination After the best combination is obtained the design will be tested and be compared with its early test result. In this test parameters that will be optimized are N, Table 4. Design C robustness against attack Table 5

. Optimized Parameters
Nframe, nbit, thr, and bit. Genetic algorithm will be used to search for the optimum parameter by using these terms: • Number of generation = 300 • Number of individual = 20 • Crossover probability = 0.8 • Mutation probability = 0.5 • Audio = rock.wav • Optimized Attack = pitch shifting & MP3 compression Rock audio and pitch shifting attack were chosen because both produce the worst BER. MP3 compression was chosen because it's the most common attack used in current status quo. Optimization shows that design C (mean BER after attack = 0,1170) has better robustness compared to design B (mean BER after attack = 0,1553). Table 5 shows the obtained optimized parameters. Where the most robust audio is instrumental.wav with mean BER value = 0.0863. Figure 6. Shows high resistance against attacks. Though the audio still weak against pitch shifting and MP3 compression. Table 6 shows the result of attack against optimized instrumental wav:   Through optimization we can see that the audio become more robust against time scale modification and pitch shifting to a certain level. Meaning that optimization create better robustness.

Conclusion
From the test that has been conducted it could be concluded that out of three designed system, B design has the best performance and could also be concluded that framing that is done after DCT transformation ruins watermarking result. Shown from A design that did not produce any BER = 0, and framing that is done before both transformation produce output that is slightly worse compared to framing that is done after SWT. Watermarking scheme with three transformations of SWT-DCT-SVD could produce watermarked audio with mean output value of BER = 01170, ODG 0,15 and SNR 30 dB. Lastly, optimizing lowers BER value on pitch shifting, time scale modification and MP3 compression and produce audio quality that are better compared to before optimizing.