A comparison study of data-driven anomaly detection approaches for industrial chillers

. Faults in industrial chiller systems can lead to higher energy consumption, increasing wear of system components and shorten equipment life. While they gradually cause anomalous system operating conditions, modern automatic fault detection models aim to detect them at low severity by using real-time sensor data. Many scientiﬁc contributions addressed this topic in the past and presented data-driven approaches to detect faulty system states. Although many promising results were presented to date, there is lack of suitable comparison studies that show the effectiveness of the proposed models by use of data stemming from different chiller systems. Therefore this study aims at detecting a suitable data-driven approach to detect faults reliable in different domains of industrial chillers. Thus, a uniﬁed procedure is developed, to train all algorithms in an identical way with same data-basis. Since most of the reviewed papers used only one dataset for training and testing, the selected approaches are trained and validated on two different datasets from real refrigeration systems. The data-driven approaches are evaluated based on their accuracy and true negative rate, from which the most suitable approach is derived as a conclusion.


Introduction
Chiller systems are applied in many industries, while around 14% of the german electric power consumption are attributable to cooling processes [1]. Faulty system states in industrial refrigeration systems can lead to inefficient operating conditions, increased energy consumption or even wear and tear of system components. Since fault states usually occur sucsessively, plant monitoring systems are an important component to detect fault conditions on the basis of sensor data and to initiate appropriate corrective action as early as possible [2]. In this way, faults can be prevented before causing high energy wastage, damage or unscheduled downtimes, thus increasing the coefficient of performance (COP) of the plant and the deployment of maintenance workers [3]. In the future, machine-learned software is to be used to autonomously monitor real world systems and detect deviations at an early stage to initiate appropriate counter-measures, if necessary. To this end, the development sophisticated algorithms has been subject of research for decades and many solutions exist showing promising results [4]- [7]. To strengenth the performance of the resulting model in detecting fault states, there is a preselection and comparison of these algorithms in terms of resource reqiurements and precision necessary.
The aim of this paper is the identification of a suitable data-driven machine-learning approach to be used in different domains of vapor-compression refrigeration systems. In the following, scientific publications in the field of autonomous fault detection at process plants, followed by a criterion-based pre-selection and short explanation of the selected approaches will be presented. To train those data-driven models appropiately a data-basis is necessary which will be shown subsequently. This is followed by a comparison of the approaches, while the conclusion draws results and summarizes the work.

Related works
In order to make a suitable selection of data-driven methods for chiller fault detection, this section aims at providing an overview of scientific contributions in recent years focusing on anomaly detection on chillers. As part of the data pre-processing, principal component analysis (PCA) has proven to be a very effective method for feature extraction and many athors demonstrated promising results in combination with a suitable classifier [4], [5], [8], [9]. Han et al. [8] used PCA in combination with a multiclass support vector machine (PCA-SVM), using a custom dataset containig both normal and faulty data instances and training with both types of data instances. It could be proven that the support vector machine (SVM) could yield higher classification accuracy through preprocessing the data trough PCA and could also be trained much faster due to the reduced dimensions [8]. However, this approach has limited applicability in practice, since datasets from possible fault cases of refrigeration plants are rarely available. For this reason, Zhao et al. [9] have chosen a similar approach, using a support vector data description (SVDD) classifier trained in the principal component space.
A similar approach in this regard is PCA in conjunction with a one-class support vector machine, which has been applied to chemical processes [10], [11] as well as in industrial refrigeration applications by Li et al. [5]. One important aspect of this is that the model was trained by exploiting only labelled data stemming from the chiller operating condition. For this purpose, the most relevant principal components are derived from the principal component subspace (PCS) and are used as input parameters for the OCSVM, to which we will refer as PCA-PC-OCSVM in the following. As shown by many studies, the RBF kernel showed decent results. Also, only data from normal states of the plant were used for training, after which the OCSVM can classify unknown observations as normal or abnormal. Furthermore, it has been shown that the SVDD and OCSVM are resulting in the exact same outcomes while using the RBF kernel for data transformation [12].
Beghi et al. [13] have implemented a similar approach by utilizing the same dataset, with the crucial difference of removing the most relevant principal components before training and fit the classifier using the residual component subspace (RCS) spanned by the residual components (PCA-R-OCSVM). They demonstrated that chiller faults can be reliably detected in the RS. A similar approach was also applied by Li et al. [5], where a SVDD classifier has been applied. Another approach relying on principal component analysis is the PCA-T²-SPE according to Beghi et al. [4]. Thereby, Hotelling's T² distribution of the PCS and the squared prediction error (SPE) of the residual space were used for fault detection and diagnosis. Both Li et al. [5] and Beghi et al. [13] conclude that indeed the models trained in the RCS may yield higher classification performance rather than in the PCS.
In general, SVMs are widely used in the field of fault detection as, for example, in [14], where an SVM classifier is used for fault detection to classify multiple fault cases. Thereby, a generic algorithm for detection of characteristic features (CF) was applied. CFs are features by which the occurence of fault cases can be characterized particulary well. Yan et al. [15] published an approach whereby the dataset was mapped using an auto-regressive model with exogenoues inputs (ARX) after feature selection. The resulting AR-coefficients are used by a OCSVM for classification. Subsequently, the authors [7] published three years later another promising approach based on an extended kalman-filter and a recursive working OCSVM (EKF-ROSVM). The OCSVM is trained by AR-coefficients derived from an ARX-model, which was fitted with filtered data of the EKF and refines itself in testing phase by as normal classified observations. Furthermore, the authors propose to extract CF within a two step approach by exploiting Relief and generic algorithm. Other approaches also consider the application of artificial neural networks, such as in [16]. However, their model primarily focusses on the detection of sensor faults rather then anomalous system behaviour.
It should be noted that all PCA-based as well as the EKF-based approaches are based on the same dataset, which was collected by the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) within the 1043-RP study, instead of the paper by Beghi et al. [4], where an own dataset was applied. This leads to the assumption of a general lag of appropiate datasets aswell as the evaluation of algorithms of data other than the ASHRAE dataset. Therefore this study compares a selection of models based on two datasets: on the one hand, the data from the ASHRAE study 1043-RP by Comstock et al. [17] and on the other hand with data from a previous project presented in [18]. In order to select some of the presented approaches, it is necessary to define selection criteria. Since chiller fault related data are rearely available from real refrigeration systems, the selected approaches should be trainable by only one class of data (data from the normal operating condition). In addition, the selected approaches should have been used in previous works on refrigeration systems. Thus, fault detection is considered in this study, while fault diagnosis is not specifically addressed. Although many approaches exist in the literature, only the most recent papers are considered throughout this study, i.e. the ones published within the last decade. Table 1 contains the most recent papers presented in this chapter, where the criteria-based selected approaches are highlighted in gray.

Methodological approaches
The previously selected data-driven models will be presented further in this section. It should be noted that three of these models employ PCA for dimensionality reduction. PCA is an unsuperivsed feature reduction method that decomposes the given data into the model principal component space and the unmodelled residual component space, such that the first PCs represent most of the data variability. Feature extraction respective feature reduction is necessary scince utilizing high dimensional datasets to train data-driven models lead to high computational costs and memory requirements [13]. Moreover, high dimensional input data can lead to poor understanding of the resulting model [19, p. 32]. In order to reduce the original number of dimensions, a defined amount of principal components can be selected, or a defined variance threshold that the selected principal components should represent determines the number to be selected. All of the selected PCA-based models split the resulting feature space into a PCS and RCS, where the PCA-PC-OCSVM utilizes the PCS for training and testing the OCSVM. Both remaining approaches use the RS for classifier fitting, while the PCA-T²-SPE only requires the SPE for fault detection while the T²-distribution is mainly necessary for fault diagnosis [4]. The subdivision of the principal component space is done accordingly to the comulative percentage variance (CPV) of the first k PCs, so that CP V (k) ≥ CP V k applies. While Beghi et al. [4] proposes a CP V k 1 = 95%, Li et al. [5] suggests to set this value to CP V k 2 = 85%. The EKF-ROSVM is the only approach not utilizing the PCA as feature extraction method, while using ReliefF-algorithm in combination with an adaptive genetic algorithm (AGA) to select CFs [7].
Every approach uses a form of OCSVM for classification, except the PCA-T²-SPE. SVMs and different variations of it in general has been shown to be very popular in anomaly detection, since it is utilized in around 65% of all scientific contributions aiming for fault detection in refrigeration systems with data-driven methods [20]. The OCSVM is a subform of SVMs for novelty detection, whereby only one class is utilized for model training. The optimisation problem [12] is given as: where l is the number of observations, w a vector orthogonal to the separating hyperplane and ρ the offset of the hyperplane, while the slack variables ξ are margin errors. ν is a parameter bounded between 0 and 1 and represents the fraction of allowed outliers. Schölkopf et al. [12] showed the decision function for an OCSVM in (3). This decision function offers the advantage of an applicable kernel trick.
As shown in many previous papers [4]- [8], the RBF-kernel function shows superior results in chiller FDD application, which is given as: k(x, x ) = exp(−γ||x − x || 2 ). In general, this leads to two parameters to be optimised during the training process, namely ν and γ, wheras the latter tunes the kernel band width while the former defines the fraction of outliers. In order to find the optimal decision boundary, an optimal combination of the parameters must be discovered, which will be discussed in the section parameter optimization.
The model proposed by Beghi et al. [4] detects faults calculating the SPE of the RS, whereby an observation is considered to be normal if SP E(x) ≤ δ 2 , where δ 2 is the control limit, the determination of which can be traced in [4]. While T² distribution serves the fault diagnosis, the SPE is fullfills the fault detection task considerd in this paper.

Procedure
To reach comparability between the results of different data-driven approaches, an abstract data-flow model has been developed so that all algorithms are trained using the same data and procedure, which can be seen in Figure 1.
Two datasets with balanced amount of normal and faulty observations are extracted from each dataset, to reach comparability between the results of the models. Accordingly all of the following steps are performed for each dataset. The function 'data pre-processing' contains all approaches including pre-processing steps like filtering, normalizing, steady state finding, feature extraction or feature selection. Steady state detection is a pre-processing step to filter out transient operating conditions which can include overshoot and undershoot phenomena that may decrease the performance of the detection. Therefore, transient data is filtered out while the filtering process takes place according to Comstock et al. [17]. For validation purposes, the dataset is split into a train and test dataset. While the former is utilized for traning the repective model, the latter soely serves for validation purposes. In the semi-supervised setting, the model EKF-ROSVM [7] refines itself with self-labeled and as normal detected datapoints even after the model training, which is depicted in Figure 1 using a dashed line. This refinement is noted as recursion in Figure 1 with a dashed line, due not all approaches do recursion. The output of the data-flow diagram is the validation matrix of the models, containing the performance of each approach for every fault case as well as the trained and validated model, which can be used to classify unlabeled datapoints and to compare against other models.

Parameter Optimization
In the field of machine-learning, many applications require any type of (hyper-)parameter optimization for tuning the respective model parameters. Three of the selected data-driven approaches base on an OCSVM with the parameters γ and ν to optimize. In this paper a grid search in combination with 5-fold cross-validation is utilized, which has been proven as reliable optimizer with good generalization performance and prevention of overfitting the classifier [14].
In 5-fold cross-validation, the training set gets randomly split into five equal sized subsets, while each subset is used to validate the classifier trained on the conglomerate of the reamining four subsets. With the training and testing performed by cross-validation, the parameter combination gets rated with the mean performance out of the five different training sets. Every approach gets optimized twice (one model for each dataset) on the training dataset with selected features. The model with the best performing parameter set gets selected and will be validated afterwards. The resulting γ, ν and the corresponding F 1-score for each OCSVM-based method on both datasets are shown in Table 3.

Datasets
For appropriately training data-driven anomaly detection models, representative datasets are necessary. As data-basis serves the datasets from the ASHRAE study 1043-RP [21] and from a project of the Technical University of Applied Science Wildau [18]. Both datasets were arranged to study the impact of different fault types at multiple severity levels (SL). The ASHRAE project [21] is based on a centrifugal chiller system, with R134a as refrigerant, while the SmCoCo dataset is based on a screw compressor with ammonia (R717) as refrigerant.
Besides the benchmark testruns, where the normal operating scenarios were investigated, both studies present gathered data from real fault states in the test chillers, including excessive Oil (exOil), reduced water flow in the evaporator (rVE) and condenser (rVC), non-condensables in the refrigerant circuit (NC) and a simulated refrigerant leak (RL). For simplicity, the ASHRAE dataset is abbreviated DS1 and the screw compressor dataset is abbreviated DS2 in the following. Theoretically, the PCA-based approaches does not depend on prior feature selection, since PCA itself performs feature extraction, i.e. by deriving new features from the dataset. However, since all selected approaches could increase the performance of their classifier through feature selection, in this paper we perform a Relief-algorithm based feature selection [7] aswell. Therefore, a Relief-algorithm gets applied on some preselected CFs used in previous papers to identify the most significant ones, while the top third of the most influential features are selected. A more advanced approach is used by EKF-ROSVM and applies an AGA on the Relief-based feature selection in order to eliminate redundancies from the feature set [7]. The results of this analysis are listed in Table 3. As already mentioned, fault cases were simulated on both refrigeration systems. All fault cases were performed at different SLs ranging from SL1, with only slight impact on the overall chiller operating conditions, to SL4, representing serious fault effects.

Model-Evaluation
The evaluation of the data-driven approaches is performed on the test dataset extracted after the data pre-processing to examine the generalisation ability of the models for unseen obervations. The test data gets classified by trained models and the classification compared to the real class of the observations. Thus, the performance of the fault detection approaches can be evaluated divided by dataset, fault case and SL. Figure 2 shows the accuracy of all eight trained models, differenciated by the underlying dataset. It can be seen from the figure that the models vastly vary in their overall accuracy across the datasets. Especially, the PCA-based models perform better on DS1 than on DS2, with exception of the PCA-PC-OCSVM which performs slightly better on DS2 than on DS1. The EKF-ROSVM is the only approach showing promising performances on both datasets with accuracies higher than 90%, and give hope for good transferability to other refrigeration systems. PCA-T²-SPE's and PCA-R-OCSVM's performances seem to be very dataset dependent, with the latter showing the highest accuracy of fault detection on DS1. However, in practice, one may be more interested in how reliable faults are correctly classified. Therefore, the true negative rate (TNR) of the models may be more meaningful and gets acquired in Figure 3. As shown in Figure 3, it gets clear that faults being present in lower SLs are less reliably detected compared to those in higher SLs. The reason for this might be explained throug the increasing faults characteristics with higher SL. Added to that, the faults appear to be detectable with varying degrees of reliability, which gets particularly evident in the case of the fault NC, which was dependably detected in both datasets across all degrees of hardness. The detection of other fault cases appear to be a challenging task for some approaches even in greater severity levels, like rVE. Interestingly, the PCA-T²-SPE model is the only approach that circumvents the mapping of data into a higher dimensional feature space, i.e. by applying kernels. Although this avoids the introduction of further parameters to be tuned by use of appropriate search methods, it becomes appearant that the model is outperformed by the other models. Nonethess, it might be highly favourable if no labelled fault samples are available from the target chiller, as it bypasses computational expensive parameter tuning induced by the other models. Thus, it appears that by introducing a non-linear mapping of the available observations into the higher dimensional feature space, faults can be detected more reliably. Nonetheless, this also introduces additional parameters to be optimized, which is somewhat disadvantageous in terms of computational complexity or the limited applicability of suitable search strategies. The EKF-ROSVM presents a remarkably high fault detection rate especially in lower SLs with a TNR always larger than 70% and accordingly stable behavior over all fault cases. The PCA-PC-OCSVM shows a slightly reduced performance with TNRs between 12% and 92%, while it always gets outperformed in every fault case by any other approach.

Conclusion
The work at hand compared different data-driven approaches based on two different systems. Four models were derived from the literature, wherby all seem to to yield convinient classification perfomance. To ensure the comparability of different approaches, an abstract data-flow diagram were developed so that all models trained following the same procedure. Furthermore, the selected approaches were applied to two datasets of real refrigeration systems to evaluate the performances as well as the transferability of the selected approaches. Due to its high reliability, the EKF-ROSVM has proven to achieve the best classification performance in this study. The PCA-R-OCSVM showed promising results on the first dataset and outperformed even the EKF-ROSVM in most fault cases, but, however, showed vulnerabilities in transferability on the second dataset. The two remaining approaches showed promising results aswell, but were outperformed in each fault case by an other applied algorithm. In future works, the effect of recursive classifiers on the fault detection performance should be investigated more in detail. This study did not explicitly examines this influence, but only compared a recursive classifier with normal classifiers. On the other hand, a practical implementation of the identified fault detection approach could take place.