Enhancing Fault Detection in Wireless Sensor Networks Through Support Vector Machines: A Comprehensive Study

— The Wireless Sensor Network (WSN) consists of many sensors that are distributed in a specific area for the purpose of monitoring physical conditions. Factors such as hardware limitations, limited resources, unfavourable WSN deployment environment, and the presence of various attacks on nodes can lead to the presence of Faulty Nodes in a WSN. This raises the problem of detecting Faulty Nodes and avoiding Data loss. Detecting Faulty Nodes in real-world scenarios will improve the quality of a WSN. The research was aimed at developing an algorithm to determine the location of Faulty Nodes in a WSN. The algorithm uses characteristics of Machine Learning and Support Vector Machines (SVM), which use the classification of Data into true and false. A Mathematical Model for determining Faulty Nodes using the SVM is considered. A methodology for detecting a Faulty Node is demonstrated, which consists of Data Collection, Feature Extraction, Training, and Testing the Model. The Results of simulated experiments that were conducted with different numbers of nodes from 50 to 320 are shown. The Model is tested on Data very similar to real-world sensing Data to evaluate the ability of the Model to detect failed nodes and calculate training and testing errors. As a result, the training error is 4.6667%, the accuracy of detecting faulty nodes was 97%. The simulation results demonstrate the high stability of the proposed algorithm and are suitable for network environments with irregular node distribution or coverage gaps. In real scenarios, it can provide a high level of uninterrupted operation of the WSN and lossless data transmission. Shortcomings and prospects in research on fault detection in WSN, such as studying real-world hardware issues and its security, were presented.


INTRODUCTION
Today, Wireless Sensor Networks (WSN) act as a core component of the concept of the Internet of Things [1].WSNs are growing rapidly due to rapid deployment and low cost, as well as deployment and use in environments where there are difficulties in using cable networks.WSN technologies are used in various fields, for example, science, agro-industry [2], medicine [3], military affairs, industry, robotics and much more [4].As in all communication networks, WSNs have an important place to deal with the issue of ensuring the quality of network changes, which is the main one by reducing overhead costs [5], minimising delay time during data transmission [6], localising network nodes [7], saving energy [8], reducing losses and delays during packet transport [9].
Connectivity is one of the main indicators of network functioning quality, which is defined as the ability of each node to determine the path for data transmission [10].Each node will expand with a limited Radius, and can find other nodes within the node's connection Radius.Node link Radius, Antennas, and Network node locations are all factors in whether there is connectivity between two nodes and are characterised by the connectivity probability [11].It is important to note that through the probability of connection, it is possible to obtain the physical possibility of data delivery.However, this capability is not a sufficient condition for successful data delivery, which is also noted in the papers [12]- [17].The quality of each of the channels on each leg of the route affects the actual probability of data delivery.data delivery may not be possible if the network is overloaded with traffic, and there are also malfunctions due to interference between nodes or third-party interference [18]- [20].
A decrease in network performance is a sign of the presence of faulty sensor nodes, resulting in an increased packet loss and delay time in the wireless sensor network as congestion occurs due to limited bandwidth.The authors of [21]- [26] consider various approaches for localising failed nodes.Identifying failed sensor nodes can improve network performance.Thus, one of the important areas of research in WSN is the continuous diagnosis of sensory nodes and obtaining their status.This helps to ensure continuous network service despite the failure of individual nodes in the network [27]- [31].This article discusses some pressing issues associated with distributed fault diagnosis for intermittent sensor failures in WSNs.In other words, diagnosing WSN failures is critical for maintaining network quality.There are traditional methods for detecting a faulty sensor node, but their performance is low under various conditions, for example, when deploying a network in adverse conditions [32]- [34].In this regard, Machine Learning algorithms have achieved good performance as a method of using experience to improve the performance of the system itself in order to establish an efficient, accurate, and reliable method for self-detection of nodes [35]- [38].
In recent years, various classifications of faults have been proposed in the WSN.The main malfunction of the WSN is a node in which at least one of the main parameters goes beyond the established operating norm.One of the reasons for failure, after which the object goes into an inoperable state, may be the presence of errors due to different software.To solve such problems, various fault detection methods are used [39]- [42].
Featured learning with a teacher occurs using labelled inputs.Algorithms such as Random Forest, and Logistic Regression, and their behaviour with a range of predictor variables and sample sizes are among the main supervised learning algorithms [43]- [46].
The Research presents an algorithm for determining Faulty Nodes in a WSN.Сontribution of the study is the methodology for detecting a Faulty Node based on a mathematical model using the SVM and k-Nearest Neighbors (kNN); modelling to achieve high algorithm stability for network environments with uneven distribution of nodes or coverage gaps.
The research paper is organised as follows: Section II presents a research methodology, such as systematic scientific literature review of fault detection in WSNs and introduces a mathematical description of the SVM.An analysis of experimental studies and a description of the software implementation are given in Section III.Finally, this article concludes and explains further research in Section IV.

A. Survey Methodology
In this section, we briefly show a Survey methodology on the fault node detection in WSN described in this work.This work is devoted to the study of methods for detecting node defects for WSNs based on Machine Learning -SVM, identifying gaps and problems, as shown in Fig. 1.The search methodology was based on the results of a number of studies, first of all, keywords were identified before the main meaning and content, as a result of which a search and study of works in the databases of Google Scholar, IEEE Xplore, Springer link, Web of Science, Scopus and Science Direct was carried out by key words: "Wireless Sensor Network", "Node Attacks", "Faulty Node Detection", "IoT Security", "Machine Learning", "Support Vector Machine" for 2022-2023.At the third stage, an analysis of the works was carried out, repetitions were excluded, and at the next stage, research papers were selected regarding the direction of the study.At the final stage, works with open access were selected for the analysis of the proposed solutions, if there were not enough works, then it would be necessary to repeat all the steps starting from the first one.
As a result, 81 papers were selected from the above databases, reflecting current issues and methods in the area of work.

B. Geographical Distribution of Publication
When working with the literature, the number of publications by country for 2022-2023 was analysed by searching for a study, and then the country mentioned in the affiliation of the first author was selected.At the next stage, the number of works by country was summarised.A total of 484 articles were selected, the results are shown in Fig. 2. The Fig. 3 also shows a map with the geographical location of the countries with the largest number of publications on determining faulty nodes in a WSN.The histogram shows countries by the number of publications, the average number of publications for the specified period is 69 publications, the country with a high were obtained with the help of the ACM Digital Library system.Due to the increase in the number of devices in various applications, the number of works and scientists will increase every year, offering their unique solutions for detecting faulty nodes in a WSN, which generally ensures security and privacy when interacting with the Internet of Things.
At the same time, 899 discussions and more than 120,000 publications were found in the Research Gate scientific information social network on the scientific topic "Internet of Things", according to the results of the analysis, scientists with a high number of citations were identified and tabulated in Table I

Paper Year Main idea Conclusion
[47] 2023 The authors proposed a fault detection algorithm that does not load SN resources when evaluating the WSN failure state The authors evaluated the proposed algorithm using Matlab, Google colab, and the effectiveness of the algorithm was determined by extensive simulation [48] 2021 The authors return a method for detecting IoT infections using SVM The authors added a component reflecting the reliability of the proposed solution, which gave satisfactory results in the form of an accuracy of 90.28% [49] 2019 For knowledge discovery the authors propose a hybrid Machine Learning model with multivariate time series data The results are obtained using a hybrid model with a Random Forest (RF) and achieve 94.86% accuracy [50] 2023 The authors proposed a system to improve the performance of a wireless sensor network As a result, an accuracy of 97.84% was achieved for 500 nodes, which confirms that the proposed system is competent for attack detection [51] 2019 The authors propose to use the method with the concept of clonal selection of an artificial immune system.Detected faults are classified using a probabilistic neural approach as persistent, discontinuous and temporary networks The probabilistic neural networks (PNN) model was used in the work and the result demonstrates 97% accuracy [52] 2022 The authors present the results of a study, indicating the superiority of the proposed DE-SVM and GWO-SVM approaches The RF model was used in the work and the result demonstrates 81% accuracy [53] 2020 The paper analyzes the SVM, RF, PNN, MLP and LSTM models The results show that as the number of faults increases, the LSTM detects a higher rate that ranges between 80 and 90% [54] 2021 In this paper, the authors combine the CV-SVM and PSO-PNN methods the result demonstrate identifying four states with an accuracy rate of 83.3%, 72.5%, and for identifying three states, the accuracy rate reaches 90%, 85% [55] 2022 The paper investigates the question of the influence of the sampling method for predicting faults in the network As a result of using the SVM model, the accuracy is between 0.29-0.83and when using the Extra Tree model, the accuracy reaches 96% [56] 2022 The authors presented the results of a study intended to develop and implement a global approach to fault detection The results of the work demonstrate a fault detection accuracy of 99% [57] 2022 The authors proposed an end-to-end deep learning environment for diagnosing sensor malfunctions The results of the work demonstrate an accuracy of 100% when locating a faulty sensor, 98.7% when determining the type of fault, and 99% accuracy when reconstructing [58] 2022 The authors propose a method for diagnosing machine faults based on WSN sensor calculations and a separable convolution.
The paper considers CNN and ResNet models and proposes a method whose results reach 98.3% accuracy, the amount of data is reduced, and there is a 15% energy saving [59] 2022 In the article, the structure of the IoT is developed and implemented in real time for complex electromechanical equipment The paper considers the long short-term memory (LSTM) model, whose results reach 90.67% and 100% accuracy [60] 2023 This article presents a process for building an Acoustic Emission fault detection system using Machine Learning methods The accuracy of the method based on the fine decision tree ML model reached 96.1% [61] 2023 The authors used AD methods using a dataset to perform error diagnosis analysis using four unsupervised learning approaches with different principles The paper considers anomaly detection in WSN for fault diagnosis using Machine Learning [62] 2023 The authors proposed a possible approach using the theory of spatial correlation This paper proposes a scheme for determining the fault state of deployed sensor nodes using an SVM classifier based on Grey Wolf Optimization (GWO) [63]  An overview of scientific works is presented in Table II, which focuses on the application of various methods based on Machine Learning to diagnose faulty wireless sensor network nodes.Machine Learning methods such as SVM, PNN, CNN, MLP, RF, and LSTM are widely used in practice to detect and diagnose failures in WSN and have significant performance.In most works, SVM is used in conjunction with other methods and shows high accuracy.The literature review (Table II) can be classified into the following categories: − Fault detection algorithms that do not load WSN resources.These algorithms aim to reduce the energy consumption and computational complexity of the sensor nodes by performing fault detection at the base station or the cluster head [47].
− SVM.These methods use SVMs as a supervised learning technique to classify the sensor nodes into normal or faulty states [48].
− RF.These methods use RFs as an ensemble learning technique to detect and identify different types of faults in the sensor nodes [49].
− PNN.These methods use PNNs as a non-parametric learning technique to classify the sensor nodes based on their fault states [51].
− LSTM.These methods use LSTM networks as a deep learning technique to capture the temporal dependencies and patterns of the sensor data [53].
− Hybrid approaches.These methods combine different Machine Learning techniques to improve the performance and accuracy of fault detection [54].
− Anomaly detection.These methods use unsupervised learning techniques to detect outliers or anomalies in the sensor data that indicate faults [61].
Here are the statistics of publications over the past 10 years on the topic of detecting faulty nodes in a wireless sensor network, as shown in Table III and Fig. 4. In this study, a growth pattern is observed, which confirms the relevance of the chosen direction and the need to propose new methods for detecting faulty nodes in a WSNs.
After that, researchers reviewed a number of existing studies of fault detection methods regarding the Machine Learning methods used, such as SVM, PNN, CNN, RF, DT, LSTM and illustrated them in the taxonomy diagram in Fig. 5.

C. А Machine Learning-based Node Positioning Concept
SVM was first proposed by Vapnik in 1995 [67], where it is said that it implements the idea of mapping an input vector into a multidimensional feature space using some nonlinear mapping selected in advance.SVM uses a dataset from a specific space.An optimal separating hyperplane is constructed in this space.The greater the distance between the separating hyperplane and the objects of the separable classes, the smaller the average error of the SVM classifier.The structure of faulty node detection based on SVM is presented in Fig. 6 [68]- [72].Suppose N={  ,   }   is a set of training samples, where   ∈   is an input vector into the space ,   ∈ {−1,1} are class labels.The optimal hyperplane is described by equation (1), where  is the classifier displacement parameter: where  ∈   , || = 1||,  ∈  .
The problem is to determine a hyperplane in which the space  can be divided linearly by solving the following minimization problem (2): min:  = (3) subject to ∑  =1     = 0,  ≥   ≥ 0, where parameter  is a parameter that is used to control between the margin and the learning error.(  ⋅   ) -a Kernel Function [73] designed to transform input data into a high-dimensional feature space is required to implement SVM.As a result, the nonlinear SVM function is described as (4): where  ≥   ≥ 0.
«One-against-all»: this method creates  binary classifiers, each of which is trained to distinguish one class from the remaining  − 1 classes.During the testing phase, the class level is determined by a binary classifier that produces the maximum output value.This method has features such as high memory requirements and unbalanced training sample size.
"One-against-one": this method builds ( − 1)/2 classifiers.The method is symmetric and, compared to the previous method, has a large classifier size, which entails a high learning rate.It is important to note that the number of classifiers becomes larger as the number of classes increases.
"Error-correcting output codes": this method checks for erroneous data and then corrects it.There is a possibility of errors occurring during data transmission, which leads to incorrect results.The method improves performance by coding into different categories and then converting to the corresponding codes.
The fault detection mechanism presented in this section was used "One-against-one".

A. Performance Evaluation
To study the behaviour of the model, we conducted simulation on the MATLAB platform, which was installed on a PC with the following characteristics, CPU: Intel Core™ i7 1165G7 4 Core-Processor, GPU: Intel® Iris Xe, OS: Microsoft Windows 11 Pro 64-bit, Storage: 512GB NVMe M.2 SSD.MATLAB is used to simulate results of research.The choice of software version, hardware and operating system can affect the performance of the algorithm.MATLAB's selection feature is an animation feature that allows you to visualise the dynamic behaviour of a system in a real-time environment [74].
The program creates a model of a wireless sensor network in a two-dimensional area [1000m1000m].The number of nodes is regulated in the GUI.In the experiment under consideration, the number of nodes is determined to be 150.Table IV shows the simulation parameters and values.The parameters determine the efficiency of the algorithm and are used to evaluate the model.To obtain the simulation results, assume parameters such as Number of nodes, Deployment area, Initial energy, Transmission range, Carrier sensing range, Population size and Maximum number of iteration.For example, initial energy characterises the node's ability to work.As shown in Fig. 7, during the simulation, nodes are randomly located within the specified area.At the next stage of modelling, optimal connections between neighbouring nodes within the radius and the specified range are generated, the result of which is demonstrated in Fig. 8.In WSNs "trust" plays a very important role in node communication."Trust" can be described as a set of attributes that provide security, reliability, and protection with respect to universality.The algorithm for calculating trust consists of five steps and is shown in Fig. 9, where RREQs are routing requests, RREP are responses to a request, α and β are static weighting factors.As a result of the application of the trust calculation algorithm, the modelling results were improved and are reflected in a high percentage of accuracy in detecting a faulty node.

B. Experimental results and Analysis
The node localization process consists of several stages, such as training, translation and localization (Fig. 10): 1) During the learning phase, a model has been generated: each node in the network sends an information packet to a beacon node within its wireless range and sets a beacon node outside its range as unreachable.This allows each node, counting the beacon node, to calculate a distance vector from each beacon node and store it internally.Each beacon node transmits an information packet to the receiving node.The information packet includes the identification number of the beacon node, information about the coordinates of the node itself, and information stored in the distance vector of the node.The algorithm performs SVM training for the X-axis and Y-axis at the receiving node, respectively, and computes all categories  0 ,  1 , … ,  −1 ,  0 ,  1 , … ,  −1 .At the output of the training stage, the corresponding information about the SVM parameters is formulated: {  ,   ⋆ ,  ⋆ }, {  ,   ⋆ ,  ⋆ }; 2) During the broadcast phase, the receiving node transmits information about the SVM parameters {  ,   ⋆ ,  ⋆ }, and {  ,   ⋆ ,  ⋆ } calculated during the training phase to every node of the network zones.Therefore, each node has complete information to calculate the function (4); 3) After the unknown node receives information about the parameters of the SVM in the step, it classifies the SVM at the node according to the distance vector generated by itself, evaluates its area category, and calculates the centroid coordinates of the area cell (′(  ), ′(  )) as calculated position coordinates.
This model is tested on data very similar to real-world sounding data to evaluate the ability of this model to detect failed nodes and calculate training and testing errors.As a result, the training error was 4.6667%.Training error is a measure of how often it makes wrong predictions compared to the normal model.This simulation is close to normal and real simulation because it depends on the normal distribution of values.At the same time, the simulation carried out in the research is close to a normal and real simulation, since it depends on the normal distribution of values.Fig. 10.The process of node localization Fig. 11 shows attack detection is implemented as follows, the program randomly selects some nodes and sets these nodes as unknown.Then the previously trained SVM is applied and the training is repeated, i.e. tested, and the program determines which ones will be attacked.The attacked nodes are circled in red.For more accurate results, the experiment was repeated 10 times with the same network parameter setting, and an average of 10 experiments were also performed.In each experiment, all nodes were placed in a random order.The performance of the WSN node positioning algorithm directly affects its application, so during the study, the focus was on the performance of the algorithm in terms of positioning error, classification accuracy and range error.The experiments were conducted with a different number of nodes from 50 to 320.The results presented in Table IV showed the effect of different ratios of nodes and the radius of communication between nodes on the classification accuracy of the positioning algorithm and the positioning error.
The results of the simulation are shown as a graph: As shown in Fig. 12, for the same link radius, thе mоrе bеасоn nоdеs thеrе аrе, the hіghеr thе dеgrее оf сlаssіfіcаtіоn aсcurаcу.The categorization accuracy increases with radius.The number of beacon nodes covered in the node's communication range increases with a larger communication radius and a higher proportion of beacon nodes, and since the distance vector produced by the positioning algorithm during the learning phase can more accurately reflect the node location information, the degree of classification accuracy is higher.The more nodes, the higher the error rate.With a longer communication radius and the same proportion of bеасоn nodes, the position inaccuracy will be the smallest, as illustrated in Fig. 13.The positioning error lowers with an increase in the proportion of beacon nodes for a given link radius, although the difference is rather minor, showing that the technique dоеs not require high proportions of bеасоn nodes.As a result, we draw the conclusion that the positioning algorithm works better in networks with fewer beacon nodes.Detection of Faulty Nodes improves the quality of a WSN and helps avoid Data loss, the accuracy of detecting faulty nodes was 97%.Experimental results show that the positioning algorithm does not impose high requirements on the ratio of beacon nodes.However, classification accuracy and location error are more suited to network environments with sparse beacon nodes and high range errors because they are reasonably tolerant to range error variations.The algorithm has high reliability, high stability, and flexibility.
The proposed approach was simulated in MATLAB, but when applying the algorithm in real scenarios, it is important to take into account hardware, software, energy efficiency, type of communication, scalability, testing and integration factors.Careful consideration is necessary for equipment selection, compatibility, energy efficiency, data management, communication protocols, and security to ensure the algorithm's successful implementation and reliability in practical applications.
The presented research demonstrates the high detection rate of Faulty Nodes in a WSN, but it is also important to note potential hurdles and issues that may arise during actual deployment: • Machine Learning Methods are not capable of ensuring complete security in WSN [75][76][77]; • The Hardware used is not always suitable for the Learning Process [78,79]; Countries and academic researchers with a large number of published articles have been identified and presented in this article.The number of WSN applications is increasing every day, which contributes to the emergence of various security issues.It is impossible to single out one or another country as a Leader in terms of technological achievements, but nevertheless, the number of works and scientists offering their unique solutions for detecting faulty nodes in WSN is generally impressive and emphasises the importance of security.
In this article, the idea of Machine Learning is introduced into the node positioning technology of a wireless sensor network.Positional relationships among nodes serve as training data and linkages between beacon nodes and unknown nodes serve as test data.When wrong data is recognized, data sensors provide false information to the application, which might cause significant harm.Consequently, the SVM classifier was applied to deal with these issues because it can distinguish between ассurаtе and false dаtа.Аftеr thе algorithm has trained the training data, it broadcasts information about the SVM parameters to еvеrу nоdе іn thе nеtwоrk.Аftеr the unknоwn node receives it, it performs node location according to the ratio of the distance between it and the beacon node.Simulation results show that this positioning algorithm has better resistance to ranging errors and is more suitable for network environments with sparse beacon nodes.The algorithm has high stability and is suitable for network environments with an uneven distribution of nodes or gaps in coverage.
The goal of this work is to use SVM to detect faulty nodes in WSN.To achieve this goal, the following steps were completed: • analysis of the mechanism for detecting faulty components; • experimental assessment of the effectiveness of the developed method by simulating its operation in MATLAB.
We note that the proposed algorithm holds promise for real-world applications beyond its current MATLAB implementation, its successful execution hinges on addressing the multifaceted challenges related to hardware, software, energy management, scalability, communication, and security.Conducting rigorous field testing and validation is a prerequisite to ascertain its practicality and performance in diverse real-world environments.The issues under consideration that may arise in the deployment algorithm include energy management.In research, it is not always possible to predict how much energy will be spent on the training process, which may overall affect the performance of the system as a whole.
The Research solves the problem of detecting Faulty Nodes in a WSN, thereby providing a high-quality wireless network while preventing Data loss.The developed algorithm can be used in Smart City, Smart Greenhouse, etc.
Applications, where the number of nodes does not exceed 320.
In the future, a comparative analysis of the SVM with other methods and modifications of SVM.The presented work can become a starting point for studying the most pressing issue and also for initiating new methods for determining faulty nodes.

Fig. 2 .
Fig. 2. Total publication of a country

Fig. 6 .
Fig. 6.Structure of fault detection based on SVM When detecting tasks from faulty nodes in WSNs, there are different levels of complexity.One reason that can be cited for the complexity is the limitation of the resources and facilities of each node.The use of classifiers at the node level can help in solving the problem.

Fig. 12 .
Fig. 12. Learning errors Table V shows the experimental results, including the number of nodes and the corresponding training and testing error percentages (averaged over 10 trials) for different node configurations in the study.

TABLE I .
THE TOP MOST ACTIVE AUTHORS BETWEEN 2022 AND 2023

TABLE II .
SOME STUDIES OF FAULT NODE DETECTION IN WSN

TABLE III .
THE NUMBER OF PUBLICATION

TABLE IV .
SIMULATION PARAMETERS AND VALUES

TABLE V .
THE RESULT OF THE EXPERIMENTS 80, 81].