1. Introduction
Ionospheric scintillations are defined as phase and amplitude variations of trans-ionospheric radio signals, caused by irregularities and rapid fluctuations of the electron content in the upper layers of atmosphere. In this work, we focus on the effect of scintillation on Global Navigation Satellite Systems (GNSS) signals. Whether the main purpose of the application under consideration is navigation, in which the user demands high quality measurements and scintillation mitigation or rejection, or a scientific application, in which the user is interested in selecting scintillation events within huge datasets, a clean, fast, and precise detection of scintillation events is of paramount importance. The scientific literature on the study of scintillations and their effect on GNSS systems is wide and complete, and therefore is outside the aim of this paper. We refer the reader to several sources to study in depth the scintillation topic [
1,
2,
3,
4], the receiver design, and signal processing techniques [
5,
6,
7].
The idea of using machine learning techniques for scintillation detection is not new. The need for machine learning solutions is due to the fact that manual classification of scintillation events based on visual inspection of scintillation indices time series is nowadays unpractical, given the amount of data. Similarly, simple heuristic algorithms, while requiring barely any tuning time from a human expert, have limited performance. Examples are algorithms that decide whether scintillation is occurring or not based on a simple threshold crossings of quantities such as the amplitude scintillation index (), the carrier to noise power spectral density ratio () or the satellite elevation angle. It has been proven that machine learning algorithms can successfully replace and overcome traditional detection techniques.
Such approach was originally proposed in [
8,
9], where the authors use an automatic scintillation detector based on a supervised machine learning technique called support vector machine (SVM), respectively for amplitude and phase scintillation. This approach provides good detection results but introduces a Fast Fourier Tranform (FFT) operation on the data over windows of 3 min, thus reducing the temporal resolution of the detection results. A second disadvantage is that the input data need to be pre-filtered at an elevation mask of 30
to reduce multipath false alarms, at the expenses of a potential waste of useful and important information. An updated version of the algorithm is presented in [
10]. Other works propose using decision tree and random forest algorithms and low level signal observables, such as the in-phase and quadrature correlation outputs of the receiver tracking loop [
11,
12]. This approach is able to reduce the false alarms rate due to the ambiguity between scintillation and multipath typical of the classical approaches based on the analysis of amplitude scintillation index and provide an early run-time alert. Another advantage is that, by exploiting the high rate correlator outputs, a finer time resolution in detection is obtained [
13,
14]. A more generic survey of data mining techniques is given in [
15]. However, this method does not only rely on GNSS observations, but also includes external data, such as sensors and online forecast services.
The biggest problem with machine learning based classification is that datasets must be labeled manually by experts, wasting precious resources in terms of time, and thus preventing large scale scalability of the method [
8,
11]. The techniques proposed in the literature and described above partially solve the problem as they are based on fully-supervised learning: a significant portion of input data must be manually labeled to assist the learning process of the algorithm. We here propose to switch to a paradigm centered around semi-supervised learning, in which in contrast to fully supervised learning the label information is not required for all the data, and some entries can be thus classified with an “
unknown” label.
In recent years, deep learning based solutions [
16] for classification tasks have seen tremendous improvements in terms of performance. Neural networks in particular, however, have been rarely used as detection algorithms, one exception being [
17]. Nevertheless, they have been successfully deployed for complex modeling of electron content in the ionosphere [
18,
19].
Usually, deep learning based semi-supervised classifiers (see, for example, [
20]), reduce the dimensionality of the unlabeled data
X using unsupervised learning, then build a classifier using only the compressed labeled datapoints
F for the datapoints where the corresponding label
Y is available, as depicted in the scheme of
Figure 1.
In this paper, we make use of the DeepInfomax [
21] approach in which a variational representation of mutual information [
22] is used to build an unsupervised, information theoretic, feature extractor. Differently from other unsupervised techniques such as Variational Autoencoders [
20] that require both an encoder and a decoder, the considered method only needs and encoder, almost halving the complexity. Variational techniques powered by neural networks for mutual information estimation, and in general divergences between probabilities, have been widely used in the literature [
22,
23,
24,
25,
26]. The complete pipeline we propose consists of firstly compressing the input time series into smaller cardinality vectors, and then feed these vectors to a binary classifier for scintillation detection. During training, the unsupervised compression is performed for all datapoints, whereas, obviously, the training of the classifier is performed only for the labeled datapoints.
The aim of this paper is then to develop a new scintillation detection strategy based on Convolutional Neural Networks and semi-supervised learning. In more detail, we extend the results and approaches presented in [
11], in which classification of scintillation events is performed using supervised decision trees. While showing good performance, this method has two fundamental problems. First, it lacks the capability of considering as a whole a single scintillation event sampled at different time instants, as the time feature is not taken into account. Second, it requires to have a fully labeled dataset, where the process of annotation had to be done manually by a team of experts in the field. We instead propose a solution that mitigates both problems by using a system that naturally:
handles subsequent samples in time, by applying an overlapping time window to all features, thus including the information about the temporal correlation;
requires only a smaller portion of the dataset to be labeled.
We show that to reach the same level of accuracy, just about 20% of datapoints must be manually labeled. For a faithful comparison, we used the same dataset as in [
11] and we performed statistical measurements of testing classification performance using the same splitting of datasets for training and testing. Moreover, when the amount of labeled datapoints is augmented, the proposed method reaches extremely high performance with a top accuracy on the test set of 0.9993.
The overview of the paper is then the following: in
Section 2, we discuss the need for unsupervised compression; in
Section 3, we give a brief overview of the DeepInfomax approach that we further expand in
Appendix A; in
Section 4, we describe the considered datasets and their properties; in
Section 5, we give an overview of the system architecture and training and testing procedure; and, in
Section 6, we show the results of the proposed classification method. Finally, in
Section 8, we discuss on the results and suggest possible future work directions. We also include a complete
Appendix B to briefly describe the neural network architectures and training technicalities.
3. The DeepInfomax Approach
In this section, we give a brief overview of the DeepInfomax approach [
21].
Appendix A provides mode details about the technique.
Figure 2 schematically depicts the idea.
The main idea is to build a two stage compression, using a cascade of two neural networks
and
, of an input
X, respectively generating a first equivalent feature
and a second one
, such that the Mutual Information (MI) between
is maximized. In practice, as also explained in [
21], a proxy to the mutual information that worked better in practice, the Jensen Shannon Divergence, was used. The authors in [
21] introduce, and also maximize, a new metric: the Local Mutual Information. It is defined as the sum of all mutual information between each element
of the variable
H, that is, a multidimensional random vector and the whole feature vector
F. In equations,
, where
is the mutual information (MI) between
A and
B.
The sketch of the architecture we consider in our work is depicted in
Figure 3. We found that, for this work, considering only the local Mutual Information was the best choice in terms of performance. Moreover, a
Uniform Prior regularizer, similarly to the one described in [
21], was included to ensure that the features
F cover evenly the latent space.
It is possible to show ([
21,
22], and also
Appendix A) that a Local MI estimator can be built as depicted in
Figure 4.
Couples extracted from the joint and marginal distributions of
are built (respectively
,
). These couples are fed to two copies of the same network
whose output is the input of a cost function described by Equation (
A7). Notice that, to build couples sampled from the joint distribution
, it is sufficient to observe joint realizations of variables
, while, to build samples whose distribution is the product of the marginals
, we can instead randomly shuffle one of the two sets of variables
H or
F and build new couples with the scrambled versions.
To perform regularization instead, always
Figure 4, a discriminator network is fed with both the random variables uniformly generated and the features
F, and trained to distinguish among the two. The rest of the system is trained adversarially [
27] to fool the discriminator, and it is possible to show that, in an equilibrium regime, the distribution of
F is the multivariate uniform distribution.
The whole process serves the purpose of extracting high quality and informative feature vectors F in an unsupervised way (i.e., no external labels are needed), which can be used for other downstream tasks such as classification or clustering.
4. The Data Collections
In this work, we use the same datasets as in [
11]. The dataset is composed by a multivariate 10-dimensional time series composed of the following features: I and Q correlator outputs, sometimes referred as GNSS raw data; Signal Intensity (
), computed as
; the raw phase
of the received signal; the amplitude scintillation index
; the
; the satellite elevation and azimuth angles; the de-trended measure of carrier raw phase
; and the phase scintillation index
. I, Q, the raw phase and consequently
are provided at 50 Hz; the other observables are provided at 1 Hz. The overall dataset is then built by performing a moving average with integration time of 60 s and downsampling the high-rate features at 1 Hz. Before processing the data with neural networks a standard normalization, mean removal and division by standard deviation feature by feature has been performed.
Data have been collected in Hanoi, Vietnam (11
20’ N geo-magnetic latitude on 26 March and 2 April 2015. The measurements have been recorded by means of a Software Defined Radio (SDR) receiver and data grabber, as detailed in [
28,
29]. L1 C/A signals of 20 Global Positioning System (GPS) satellites were considered, corresponding to 169955 entries, recorded over a time interval of approximately 6 h, with a scintillation rate of 1:4. The ground truth label has been determined manually, based on the visual inspection of the observables.
To avoid confusion, it is important to underline that all the datasets used in this work have been manually annotated, as a ground truth is needed to measure classification accuracy performance. In our experiments, we artificially remove some of the labels to simulate a semi-supervised learning scenario.
5. Overview of Functioning and System Description
As mentioned earlier, one of our goals is to be able to harvest the information contained in the temporal correlation among features. For this reason, we consider as input of the system sliding windows of a multivariate serie
built as described in
Section 4. The input multivariate time series has thus cardinality equal to 10, or, in neural network language, the number of input channels is 10.
If we denote the sliding window size as
, we can formally define the input as the
matrix :
where
n is the last time instant of the window and
is the
feature We decided to use a causal time window such that the method could be applied to a real-time detection scenario. We allow for a delay, such as in a post processing application, and define the window as
The performance will improve. Notice that in terms of coding the change is trivial with respect to what we implemented, a simple offset in the indexing is sufficient to switch between the two modalities. Notice also that for this second implementation
must be odd. In this work, we consider the first variant and select
. To test the performance of the proposed method, the entire dataset is split into a training part (90%) and a testing part (10%), using the same procedure as in [
11].
During training, the sliding windows are processed and compressed according to a variation of DeepInfomax scheme [
21] (highlighted in
Section 3 and expanded in
Appendix A) that produces a feature vector. As previously introduced, this first stage of training is completely unsupervised and, for our application, the purpose of this process is to create the lower cardinality equivalent feature vector that will be used as input of the classifier. To make the picture clearer, the cardinality of the raw input is the size to the 10-dimensional sliding windows, i.e.,
, while, in this work, we found that a well performing feature vector cardinality (
) is 20. Thus, for all the time instants for which the scintillation label is available, a classifier is trained with the corresponding feature vector. Notice that the training of compression scheme and classifier is jointly performed end to end using backpropagation.
Figure 5 depicts the global architecture of the system used during training. In the spirit of focusing on the important high level description of the method, all the details related to actual networks architectures and optimizer details are relegated to
Appendix B. As anticipated, the first part of the scheme is related to the unsupervised feature extraction, while the second part is devoted to classification of labeled datapoints. The whole procedure can be summarized as follows, considering a batch based training:
A batch of input time windows is is sampled from the training set.
The inputs are processed by a first convolutional neural network and a first level compression is performed, generating variables .
Through , a second neural network with both convolutional and fully connected layers, the final features are built.
Starting from this point of the computation chain, three different parallel processes are performed:
- (i)
classification of features
to determine whether a given time instant
r corresponds to scintillation. To be completely precise, as explained in
Appendix B, the actual input to the classifier is the concatenation of feature vector
F and the result of skip connections taken from the layers of the first convolutional network
. This is done only for the time instants for which there is a reference. For all time instants for which a reference signal exists, the corresponding feature
is fed to fully connected classifier
that receives as reference signal
, the corresponding label scintillation/no scintillation for time instant
r.
- (ii)
local Mutual Information (MI) computation: starting from the collections of
couples extracted from the joint and marginal distributions are built. These couples are fed to the network
whose output is the input of the
cost function, defined in (
A7).
- (iii)
adversarial matching of features
F distribution to a target uniform distribution. Random variables
are generated. A discriminator is fed with both the random variables generated and the
F (the rest of the system is trained adversarially [
27] to fool the discriminator). This is used as a regularized to ensure that the features
F cover evenly the latent space.
Standard backpropagation is performed, the cost function considered is , where the subscripts represent the various component of the cost function.
During testing, the only portion of the network that is used is the one related to classification, for simplicity depicted in
Figure 6 that is clearly much simpler and lighter. In a realistic scenario, the training could be then performed offline on powerful machines while the lighter scintillation detection system could be subsequently uploaded on SDR receivers where the computational capabilities and the maximum power consumption are stricter.
6. Experiments
In this section, we present experimental results of the DeepInfomax method, compared with the Decision Tree proposed in [
11], varying the number of labeled datapoints, by comparing the classification accuracy with respect to the baseline.The dataset used for the experiments is the same as in [
11], but we neither train or test the method on the first 63 s of each data collection, due to the necessity of having a full window of samples. For each of the experiments, the labeled datapoints are randomly extracted from the training set. In the following, we present results in terms of standard machine learning performance indicators that we hereafter define. Denote as
C the true class to which a datapoint belongs (0 for non scintillation, 1 for scintillation) and as
the estimated class, i.e., the output of the classifier. We can define four joint probabilities:
true positive:
false positive:
true negative:
false negative:
Derived from these four quantities are the following ones:
accuracy: ,
precision: ,
recall: ,
F-score: .
Figure 7 reports the main result of this paper: a comparison between the proposed method and the decision tree based one, varying the number of labeled datapoints. For sake of completeness, it is furthermore necessary to clarify that, when dealing with the decision tree classification algorithm, the number of labeled datapoints (
x-axis of
Figure 7) coincides with the size of the dataset used for training, since there is no possibility to include unlabeled datapoints in the learning pipeline. Instead, with our approach, the size of the training dataset is always the same (90% of the complete dataset, roughly 150,000 datapoints) but only a portion of the datapoints is labeled.
The improvement in terms of performance is evident. First, it is worth noticing that the proposed DeepInfomax algorithm outperforms the Decision Tree technique, both in terms of accuracy and of F-score, for the same number of labeled elements. Second, a much lower number of labeled datapoints is required when using DeepInfomax to attain similar levels of accuracy and F-score. As the number of labeled points increases, the accuracy reaches a top value on the test set of 0.9993 for DeepInfomax, compared to the value 0.9824 for the Decision Tree.
It is important to notice that, even if from a performance point of view, the results are excellent, to attest to the usefulness of the method in a realistic application tests on many different datasets should be performed. It is possible, in fact, that bias is present in the considered data collection on which the networks base their decisions. However, this problem is complex and outside the scope of this paper; see [
30].
Table 1 reports the results of the method for all the performance metrics introduced above. As expected, the performance increases with the increase of number of labeled datapoints, rapidly saturating to excellent levels. It is interesting to notice that the amount of false positives and false negatives is almost equivalent.
7. A Qualitative View on Features Importance
Neural networks are known for being black box models, in which obtaining human understandable information from the inner functioning is extremely complex, if not impossible. Several techniques have been proposed to determine and inspect feature importance on networks decisions, usually with specific methods tailored to particular experiments and particular network architectures.
In this subsection, we present results on features importance based on the simple but powerful features random permutation technique (similarly to the method introduced in [
31]). The idea is to consider a trained classifier, randomly shuffle the variables one by one, and record the decrease in terms of performance. The measured decrease for all the shuffling experiments with respect to the baseline (the unshuffled dataset) is used as a proxy to determine the shuffled feature importance. Notice that this investigation strategy somehow lacks the capability of measuring joint higher order importance of features. In fact, two or more features can be extremely correlated and important for the considered task, but shuffling just one of them possibly does not impact performance in a substantial way, producing thus an underestimation of the feature importance.
In
Figure 8, we report the results of permutation tests on the classifier trained with percentage of labeled examples equal to 10%. Despite the possible limitations of the considered measurement technique, we obtain results that are consistent with the physical knowledge of the problem as well as previous results [
11].
The features in order of importance (computed as the decrease in terms of accuracy with respect to the baseline) are:
. In the considered data collections, amplitude scintillations’ events are observed. It is thus reasonable to assess that Signal Intensity,
, is the most important feature, with a drop in accuracy of more than 40% when the feature is randomly permuted. The three features
when permuted induces basically the same drop in terms of accuracy. Elevation angle
is informative for classification: signal statistics for low elevation satellites greatly vary from the high in sky ones. The amplitude scintillation index
is clearly an important feature since amplitude scintillations events are considered. Finally, the
is yet another indirect measurement of signal intensity. Notice, moreover, that
are the three features used for the so-called semihard detection rule [
11]. The raw phase and the azimuth angle appear to contain less information about the scintillation events and the features
have the lowest impact when randomized. It is possible that the relevant information contained in
is already summarized in
, while being the scintillation events linked to amplitude fluctuations the scintillation phase information is not that important.
A complete analysis of feature importance would require to retrain from scratch different versions of the classifier by completely excluding some of the features from the beginning of training, but, since this is computationally expensive and outside the scope of this paper, we leave this analysis for future works.