Injury Prediction in Sports using Artificial Intelligence Applications: A Brief Review

— Avoiding injuries in sports has always depended on historical records and human experience. This is despite using injuries being a major and unsolvable issue. The development of more precise preventative procedures using the now available approaches has been excruciatingly sluggish. The development of artificial intelligence (AI) and machine learning (ML) as potentially valuable procedures to improve damage prevention and recovery procedures has been made possible by technological advances that have made these areas more accessible. This article presents a detailed summary of ML approaches as they have been used to predict and anticipate sports injuries to this point in time. The research conducted over the last five years has been collated, and its results have been untaken. Assuming the present absence of accessible sources, standardized statistics, and a dependence on obsolete deterioration prototypes, it is impossible to draw any definitive conclusions regarding the real-world effectiveness of machine learning in terms of its application to the prediction of sports injuries. However, it has been hypothesized that resolving these two problems would make it possible to deploy innovative, strong machine-learning architectures, which will hasten the process of increasing the state of this area while also offering proven clinical tools.


I. INTRODUCTION
Machine learning (ML) is a complicated field that may be characterized in a general sense as designing a supercomputer arrangement that can provide predictive analytics via the process of experiential learning and adapting without being given explicit instructions [1][2].The use of ML has developed in a variety of sectors, including sports medicine, as the number of computer resources available has continued to improve.Due to the prevalence of injuries and the fact that they may have significant repercussions, not just physically but also emotionally and financially, evaluation, mitigation, and prevention of injuries are of the utmost significance.This is particularly true at the professional level.A wide range of ML simulations have been developed in the previous research works [3][4][5][6] to understand the difficult elements contributing to player harm and to allow improved forecasting accuracy.This will be accomplished by using these models.Larger and more complicated machine learning algorithms, including the implementation of previously theoretical methods, are becoming feasible as computer technology continues to progress.As innovative schemes are accomplished with applying novel rules more efficiently, it is thus necessary to regularly gather and examine literature that has been used for injury prediction and prevention or that may be applied to these purposes in the future.In addition, even though recent literature reviews have been exploring specific facets of this industry, there are still some limitations: the majority of the papers are printed from the viewpoint of data mining [5], they are game-oriented [7][8][9], they have a partial possibility [3,4,10], or they only concentrate on group game [6].Our goal is to provide an exhaustive review of the current status of machine learning (ML) in sports injury research, spanning various sports and utilizing various algorithms.Algorithms have been classified according to their purpose, their limits, and their existing or future use in the field of sports medicine in order to offer a platform for developing innovative machine-learning models and approaches.
A brief introduction and a review of the pertinent research published in the preceding five years are provided for each chosen algorithm.Even though these circumstantial units give background for the various procedures, delivering a quick description of broad ML ideas is still helpful.In this assessment, the term "algorithm" shall be stated as a collection of athematic calculations and guiding principles for a certain ML methodology.When determining a result mathematically, each algorithm employs its own one-of-akind set of rules and equations [2]."Training a model" refers to the process of applying a model to a dataset in an organized manner using the rules and equations that have been created.Before being put into use, ML algorithms need to be chosen and trained.Several concepts related to this subject will be discussed and explained below.
• Data set -The whole collection of data that is utilized in the process of teaching and validating a system.This information may be presented in a wide range of formats; but, in most cases, the algorithm requires that they be structured in a certain way.
• Batches -A collection of records chosen to be processed by a set of rules.Data passes are often required owing to memory limitations, and data passes are frequently desired due to optimization and preparation needs.
• Feature and feature extraction -Data are broken down into separate, quantifiable pieces called features.Selecting characteristics from a data collection that are unique and predictive may be referred to as feature extraction.The feature set refers to the collection of extracted characteristics that are used in the process of training a model.
• Labels -Social inputs are used to offer context to a machine learning procedure before it has been trained.For example, a photograph of a puppy may be physically tagged as a "puppy" before the algorithm is trained.
• Supervised learning -The practice of directing the exercise of a procedure by giving "labeled" information to the algorithm.
• Overfitting -The propensity of machine learning prototypes to "memorize" the data they are trained on.To put it another way, a model will only learn the training data patterns, regardless of whether or not there is a mathematical link between the parameters.This makes a model less applicable to broader situations.This is a common source of worry when working with data sets that include a significant number of characteristics.
• Hyper parameters/parameters -The values included inside a model known as its parameters, are determined by the data collection.Hyper parameters are specified permanently before a model's training and typically significantly influence the additional prototypical constraints.
• Error measurements -These are quantitative metrics of inaccuracy that may be derived via the use of formulae like root mean squared error.
Constructing a data collection is a necessary step that must precede the selection of an acceptable algorithm.The data format has a direct influence on the algorithm that is being utilized as well as the application that is being developed.In most cases, data sets are divided into two groups: training data and testing data.The labeling or absence of labels applied to training data is a feature that may be used to facilitate either supervised or unsupervised learning.A portion of the data is set aside for use as authentication or investigation information to establish whether the algorithm was effectively trained [2].When trying to improve the utility of a model, practically everyone agrees that larger datasets are preferable.However, even when only minor data sets are present, arithmetical approaches are used to increase the number of data opinions that are accessible, which improves the prediction ability of the model.Using this strategy is superior to training new models using data from the actual world since it is more beneficial for evaluating different machine learning methods than it is for introducing new models.After the data have been picked and segmented, the next step is to extract the characteristics.It is possible to identify these traits manually, which is a time-consuming operation; alternatively, these features may be automatically recognized as the purpose of a specific procedure.
It frequently constitutes a crucial step in the process of ideal process [5][6][7][8][9][10][11][12].Last but not least, after all of the preceding processes have been finished, a model may be trained.This practice is common in ML and AI.Additionally, it may help in the process of training optimization [2].Model validation and assessment may take place after training has been completed.Validation and assessment must adhere to a number of prerequisites for success, including the use of separate data sets for training and testing, the use of an acceptable error measure, the use of simulated data when working with smaller data sets, and an awareness of typical errors that might occur when working with ML [11][12][13].The K-fold cross-validation method is the gold standard for validation right now.For example, if K is equivalent to 10, the data will be arbitrarily divided into 10 identical parts, with 9 sections used for training and 1 saved for validation.After then, the order of these portions is changed to be more generic [14].It is crucial to note that most ways are based on shuffling or randomizing the training data.Still, other strategies often used for validation will not be discussed here since they are irrelevant to the topic at hand.

II. METHODOLOGY
An exhaustive literature study was carried out with the assistance of Ovid Discovery Search and Google Scholar, both of which gave findings gathered from a wide variety of databases.Additionally, individual searches were conducted in PubMed/Medline, the Science Direct and IEEE/ IET.Although earlier publications were referred for context, the primary emphasis was given on those that were published between 2017 and 2023.K-means, random forest, K-Nearest Neighbor (KNN), gradient boosting, Adaboost, decision tree, and artificial neural networks were some of the algorithms that were chosen after an initial assessment of the relevant literature.The following combinations were used as search phrases for each algorithm: "algorithm name" + "sport" + "injury" for example: "Artificial intelligence" + "sport" + "injury prediction" There was an effort put out to include a variety of variants in the names and abbreviations of the algorithms.This volume includes papers that focus on the forecasting and analysis of sports injuries as shown in Fig. 1 and Fig.We did not consider any of the papers since we could not obtain them or they were in accessible in English.Based on the criteria mentioned, 40 unique investigation articles and 8 review papers were chosen for inclusion in the study.To provide context, a short overview of each algorithm was provided.

III. REVIEW OF MACHINE LEARNING IN SPORTS
The following is a summary of the findings from the allencompassing literature review.Each section offers some background information on the applicable algorithm to provide context.After that, a summary of the findings of the surveyed publications is provided in each Applications section.The tested method was used to categorize the papers into these several divisions.When more than one algorithm was investigated, the papers were divided into segments containing the algorithm that proved to be the most successful and sections containing methods that proved nearly as successful when applicable.There has been no effort made to statistically compare or otherwise aggregate the findings since the designs of the studies have been so varied and the goals of the research have been so varied.Instead, we will focus on broader patterns throughout the conversation.In a similar vein, patterns of deficiencies or traps have been discussed in the part devoted to the topic.After a short introduction to generic algorithm design, publications relating to neural networks have been further segmented in order to accommodate the variety of neural network implementations.This is important to keep in mind.

A. KNN
KNN is a supervised ML approach that solves regression and classification problems by clustering data points together according to their degree of similarity.It finds widespread use in a variety of different medical specialties.For instance, research carried out in the field of oncology using KNN has successfully classified the many subtypes of acute cancer cells [15].KNN has also been used to evaluate and classify the vibro orthographic signals produced by degenerative knee joints [16].The method operates on the presumption that data points with the same characteristics will be located relatively near one another regarding the distance function.Therefore, in the case of a straightforward classification issue, KNN will assign a period to individual given data point by basing that assignment on the classes of its surrounding data points.In practice, the KNN algorithm estimates the data density by using a weighted smoothing function.Because the weighting is determined by the number of neighbors, or K, this, in essence, sets the size of the bin, which leads to tiny bins in regions with a high population density and big bins in areas with a low population density.There is the possibility of using kernel functions in order to smooth the density estimations further.KNN is advantageous for several reasons, including the fact that it is relatively easy to understand and put into practice, as well as the fact that it can provide good predictions with just a little data [17].The KNN technique, however, becomes proportionately more complex and inefficient when used to massive data sets.This is due to the fact that larger data sets include more information.This challenge is not insurmountable; nonetheless, mathematics condensing and dimensionality reduction are both required to solve it [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18].Athletes may have specialized sensors such as gyroscopes, magnetometers, accelerometers, and infrared sensors and be connected to them so that data can be collected in the field of sports medicine.KNN analyzes the data it gathers from various body parts of athletes to get to particular conclusions about the behaviors such athletes exhibit during certain sports events.Patterns that put a person at risk for harm may be identified using this recognition model, which opens the door to the possibility of injury avoidance [19].In addition to its broad use as contrast procedures, KNN was utilized in the context of injury prediction in a study published in 2018 [20] that used a bigger model that included K-means and SVM.

B. K-Means
The K means algorithm is one of the most popular clustering approaches because it is easy to implement.The Kmeans technique is an iterative one that was developed to split a data set into subgroups that are referred to as clusters.These clusters are constructed to minimize the sum of the squared distance between the data points and the cluster centroids, which corresponds to the arithmetic mean of all the data points that make up that cluster.This distance measures the arithmetic mean of all the data points in that cluster.The data points included inside a cluster are more comparable to one another when there is less variation within that cluster [21].When used in practice, the K-means clustering method relies on the initial random selection of a collection of K centroids drawn from a dataset with n cluster elements [22].After the selection, the Euclidean reserve among all the individual node points and each centroid is computed.After that, the distance between the points is used to classify them into one of many clusters (see Fig. 3).Adjustments are made to the centroids using each cluster's computed mean.This approach is repeated repeatedly until the improvement in clustering reaches a plateau, which may be determined by the centroids being stable [23].In a research published in 2020, Dingenen et al. employed K-means to demonstrate that runners who had suffered similar damages could be grouped into dualistic distinct subcategories with a mean silhouette constant of 0.53 [24].This conclusion was reached as a result of the study's findings.These were used as examples to show the varied kinematic factors that might contribute to running-related injuries.Ibáez et al. in 2022 used K-means as a data separation approach to classify female basketball performers into primary and subsequent separations.This research efficiently employed K-means to examine thresholds of acceleration, deceleration, impact and speed on the performers, and it found that there is a change among the first separation and the second separation [25].As can be shown in these recent publications, K-means continue to be successful when applied to classic clustering issues and may be suitable for investigating wound hazard factors or performer attributes.This is likely because it is simple and is already recognizable to many people.

C. Support Vector Machines
SVM are a kind of supervised learning algorithm that classifies individual data points into their own unique groups by using hyperplanes.The direction and location of hyperplanes are affected by a collection of data points referred to as support vectors.SVM map points in such a way as to exploit the difference between the two groups, also known as the maximum boundary [26][27].See Fig. 4 (a) for an example of this.After being trained on a data set, a SVM may be used to categorize fresh data points and find relevant patterns hidden within data [28].
SVMs have been trained to accurately predict future injuries for sports-specific applications by using modifiable and unmodifiable metrics such as genetic markers, training load, neuromuscular assessments, performance, previous injury history, anthropometric measurements, psychological [29][30].Recognizing wound risk variables such as these enables trainers and medicinal experts to change exercise loads, regimens, and tactics to avoid future injuries [6].For instance, in a research study published in 2018 by Ruddy et al. [31], the authors employed a variety of machine learning algorithms to analyze the risk variables that were found in hamstring strain injuries.One of these algorithms was the support vector machine.SVM gained a significant advantage from data pre-processing in another work published in 2018 by Carey et al., which likewise investigated the prediction of hamstring injuries and the associated risk variables [32], despite the fact that it was eventually surpassed by straightforward logistic regression.In a study published in 2017 that predicted in-game injuries in Major League Soccer using non-physiological data, the authors discovered that SVM were the most accurate of many investigated methods, including random forest, multilayer perceptron and logistic regression [33].SVMs, on the other hand, have been shown to be less successful than other machine learning algorithms in recent research [34][35], including two publications published in 2021 that compare the effectiveness of several ML techniques.In spite of this, support vector machines (SVM) may still be useful since they are suitable for predicting high-dimensional data sets.This is particularly true when SVM are integrated with other approaches, such as in a work published in 2022 by Wang et al. forecasting multiple jump injuries [36].

D. Decision Tree
A decision tree (DT) as shown in Fig. 5 is a form of supervised ML that forecasts a yield group based on a collection of contribution characteristics by using an iterative process that divides datasets according to certain attributes.This method is called "decision tree."Starting with the input node, also known as the root node, the data points are then divided into several bins depending on the values that they have for a certain attribute.After that, each of these bins is checked recursively to assess whether or not the data points may be again divided into distinct lesser bins to obtain a higher level of accuracy.This process continues until all of the nodes have achieved the desired size or purity.Bins capable of being subdivided further are referred to as decision nodes, while bins incapable of denoting a final choice are termed leaf nodes [37].In recent years, more modern iterations of the traditional decision tree algorithm have been widely used.Connaboy et al. conducted research in 2018 using decision trees constructed using Chi-squared Automatic Interaction Detection (CHAID) to investigate the variables that have a role in lower extremity injuries sustained by military members.Using their own model, the authors discovered many variables that, when combined, led to an increased risk of injury over a year [38].Mendonca et al. studied the relationships between various risk variables and patellar tendinopathy in volleyball and basketball athletes [39].The researchers used a classification and regression decision tree (CART).The authors of a research published in 2021 by Kolodziej et al. used a CART-DT to find injuries sustained by young people playing soccer, and their results attained a specificity of 0.91 and a sensitivity of 0.73 [40].Another work published in 2021, this time by Ruiz-Perez et al., tried to duplicate an ideal published in 2020 by Rommers et al., that used field data gathered by GPS.They did not utilize the same method as Rommers et al. and did not obtain similar presentation (AUC 0.767 vs 0.850) [41,42].Even though they found favorable comparisons between C4.5 DTs and numerous demonstrating techniques including AD Tree, SVM and KNN they did not employ the similar method.In contrast to these somewhat encouraging findings, Rossi et al. discovered that DTs, although having a performance advantage over comparator algorithms, were unable to reach a precision of more than 50% when projecting soccer injuries [43].Although the efficacy of decision trees vary depending on the data and the model structure, there is no question that they have a place in the field of sports injury prediction.In addition, they may not be generalizable and may become over fit due to their training, which reduces their accuracy [44].

E. Random Forest (RF)
RFs, a collection of random DTs, provide a possible benefit over choice trees since DTs may have a generalizability deficiency and tend to overfit during preparation [44].The random forest modeling technique depends on the generation of a collection of decision trees, which then cast their votes about the model's ultimate output (see Fig. 6).The alteration of the initial data utilizing bootstrapping, which is shorthand for random sampling with replacement, is the first step in the implementation of a random forest model.This helps to guarantee that the same data are not utilized for each tree, increasing the model's sensitivity.After that, decision trees are trained in isolation from one another utilizing a randomized subsection of features, which lowers the correlation among the trees.Predictions are formed by running the data through each tree and then adding up the findings [45].Unfortunately, RF models do not have the clearness of DTs.Hence, supplementary approaches are required to determine the value of individual features.Random forests may also have difficulty with the interpretation of high-dimensional data because uninformative characteristics may be employed during the process of node-splitting [46].Using random forest models to predict injury has met with varying degrees of success.Compared to conventional regression techniques, the random forest algorithms used in the investigation of dental injuries sustained by children due to participation in sports produced somewhat more accurate forecasts [47].In a research published in 2020, the authors attempted to solve the problem of inconsistent predicted performance by determining critical risk indicators before training the model.They were successful in reaching the AUC goal of 0.79 [48].A random forest model was constructed in a 2022 publication, and it reached a comparable level of performance, with an AUC of 0.72 [49].Random forests were used in a study of paralympic swimmers to assess the eligibility of participation in order to categorize those with and without brain injuries [50].The inquiry was effective in categorizing 96% of the 51 participants.In contrast to these research findings, a work published in 2021 discovered that random forest could accurately predict ankle damages in fresh athletes with a presentation comparable to that of a logistic regression (ROC 0.63 vs 0.65, correspondingly) [51].Even though they are susceptible to perturbations in the data sets they are fed, random forest models can beat other classification approaches provided they are applied correctly and the appropriate features are chosen without prejudice.

F. Gradient boosting and AdaBoost
It was in a publication written in 1996 by Freund and Schapire [52] when the AdaBoost method was initially presented.Gradient boosting (GB) is a generalization of that approach.AdaBoost is an ensemble approach that aims to merge several weak apprentices into a more complicated algorithm.Traditionally, single-decision trees are referred to as stumps.AdaBoost seeks to combine them into a more complex algorithm.This should be pursued since it offers a solution to a significant number of the issues associated with decision trees [52].The method of gradient boosting implements boosting in the form of a gradient descent, therefore enhancing the system with each succeeding calculation and enabling the use of a loss variable that is not specific.It addresses some of AdaBoost's shortcomings, notably the algorithm's prejudice of outliers and its incapability to do multi-class ordering [53].Gradient boosting and AdaBoost are strong algorithms that, ever since their inception, have undergone ongoing development that has enabled them to be used in various contexts to issues involving regression and classification.Therefore, a combination of gradient boosting and AdaBoost methods as in Fig. 7 is used for injury prediction.When it comes to solving specific classification issues, gradient boosting routinely achieves better results than baseline regression and a variety of ML methods, such as SVM and DT [54][55][56][57][58][59].When evaluating the elbow valgus force and shoulder disruption power of 168 school and collegiate pitchers, Nicholson et al. discovered that GB was the greatest successful of numerous methods to use [57].In a study that was conducted in 2019 to predict skier injuries, the researchers discovered that GB achieved a 0.25 improvement in correctness above logistic regression, with an AUC of 0.76 vs. 0.52 [54].This finding was rather remarkable.When projecting non-contact time-loss damages in 88 soccer members, Hecksteden et al. discovered that GB achieved improvement than other procedures [58].This was observed in a research that was conducted in 2022 and was a prospective observation cohort study.Research done in 2022 employed XGBoost, which stands for great GB, to predict post-concussion wounds in 74 collegial football performers with an exactness of 91.9% [60].This was accomplished by expanding beyond ordinary gradient boosting.In a research published in 2020, Rommers et al. also made use of XGBoost.This time, they employed it to forecast damages in 734 soccer young players, and they achieved an accuracy of 84% and a recall of 83%, respectively.In addition, the researchers correctly categorized injuries as either acute or overuse with a recall and accuracy rate of 82% [42].In addition, a latest retrospective research [61] employed an XGBoost technique to investigate the link among self reported player injuries and biomechanics.It is important to note that just one recent research, a 2022 research article that predicted injury among CrossFit specialists, was discovered to apply AdaBoost.With an area under the curve (AUC) of 77.93%, it was discovered that AdaBoost performed better overall than comparison algorithms [56].Another algorithm used to predict hamstring wounds in specialized soccer athletes was SmooteBoostM1, and it produced a procedure with an AUC of 0.837 [62].

G. Artificial Neural Networks (ANN)
In comparison to several other methods of prediction, neural networks provide a number of important benefits.They are organized as a network of nodes called neurons that are coupled to one another (see Fig. 8).These neurons are self-contained groups of procedures that generate values depending on the information they receive as input.Thanks to the use of ANN, models are able to learn from a massive quantity of data and identify patterns that would otherwise be difficult to extract.There are primarily two different kinds of ANN: recurrent and feed-forward ANN.In feed-forward ANN, the yield of one node is passed on to the subsequent nodes in the network.In recurrent ANN, the outcomes are sent to the nodes that came before [12,63].The techniques and structures available for use in nodes inside neural networks are quite diverse.An summary of these methods is beyond the possibility of this research; nevertheless, many processes, such as the deployment of RBF, CNN, LSTM, DGCN are investigated in further detail.The mathematical operation known as convolution applies a kernel matrix to each pixel of an image to produce a new picture.Both the process of filtering photos and the method of classifying images may benefit from this strategy.Convolution is a mathematical operation that may be used on any array of numerical data in two dimensions, in addition to picture categorization.In machine learning, a CNN is characterized by its use of layers that alternate between pooling and convolution to produce a feature map and, ultimately, an output [64].With its two-dimensional structure and high feature density, image analysis has long been a traditional use of convolutional neural networks.This is because images lend themselves well to the convolutional representation.However, CNNs may be used for any data that has been suitably formatted, which opens the door to a larger variety of applications outside conventional image analysis.In their research from 2017, Kautz et al. employ CNN to interpret data from wearable sensors, which enables automated player monitoring of beach volleyball players.The CNN gave dramatically higher classification accuracy when compared to other methods such as SVM, KNN, Gaussian and DT [65].Pappalardo and colleagues created a CNN to evaluate multivariate time series taken from automated presentation and tracking systems worn by professional soccer athletes.Their strategy enabled automatic feature extraction, which is a benefit above other conventional methods of time series investigation.In addition to this, they successfully developed an injury predictor that could be explained, which is a need for an example that can be used in the actual world [66].Similarly, Chen et al. offer a method for transforming time series statistics obtained by playerworn sensors into level pictures for investigation by means of a CNN.Particularly, they verify their system by just utilizing quickening statistics from a particular sensor, and they were still intelligent to reach satisfactory heights of exactness in their categorization [19].In their study from 2020, Song and colleagues built an optimized CNN to anticipate and evaluate injuries sustained by volleyball performers.They tested their system on data from several sports dimensions and discovered that it provided more accurate results than competing algorithms.In addition, they described a framework for integrating cloud-based deployment with the Internet of Things (IoT) [67].An illustration of the framework of the CNN classifier used can be seen in Fig. 9.In a research published in 2019, Ma et al. also presented a CNN for the analysis of sports data by making use of a realtime cloud-based system and the IoT [68].In their study from 2021, Ghazi and colleagues demonstrate how CNN may be used to determine the peak maximum primary strain that occurs in traumatic head injuries.They were able to attain more than ninety percent accuracy in their prediction of whether or not a player had sustained a concussion by using data from the National Football League [69].

I. Long-Short Term Memory Based Neural Networks (LSTM)
The use of gradients as a kind of training input is a characteristic that is shared by both feed-forward and recurrent neural networks.The "on/off" signs of the separate nodes that make up a ANN are influenced by gradients.Gradients can generate NA values depending on the data collection and how the model's hyper-parameters are adjusted.There have been a few different approaches to solving this issue, which is also known as exploding and vanishing gradients.One of these approaches uses LSTM nodes (see Fig. 10), creating a continuous error carousel (CEC) [70].The CEC makes it possible for gradients to carry over from one node to the next without changing.Recent implementation of a "forget gate" has made it possible for the LSTM node to be reset, which has further contributed to the reduction of gradient runaway [71].Powerful time series analysis may be accomplished with the help of neural networks that integrate various kinds of nodes.Because of the one-of-a-kind character of LSTM nodes, they can be employed in conjunction with other methods to get superior results in prediction and classification tasks.This is true even if the primary use for LSTM nodes is time series analysis.Meng et al. merged and LSTM in their work in 2021 to make it possible for the LSTM nodes to conduct a trustworthy analysis of two-dimensional data.They reached a classification accuracy of 97.0% by using photographs of professional athletes to produce risk stratification results.The results were broken down as follows: low risk, no risk, high risk and medium risk of injury.[34] The model successfully achieved a sensitivity of 95.70% and a specificity of 97.54%.A hybrid architectural model like this one may provide more accurate algorithms in the end.

J. Deep Gaussian Covariance Neural Networks
A non-parametric stochastic process is said to be in the form of a Gaussian process if it is defined in such a way that a limited collection of random variables has a multivariate standard dissemination (as illustrated in Fig. 11).In a fundamental sense, Gaussian processes may be characterized by the statistics of their second order.Determining a covariance function is necessary in order to provide a comprehensive description of the behavior of the initial process.The Gaussian process hypern parameters may be handled as outputs of a neural network if the network has a final layer of nodes that include covariance functions.This can be accomplished by adding a last level of nodes to the system.This enables the neural network to handle a simpler issue, which in this case is the tuning of Gaussian hyper parameters, rather than the actual regression, which is left to the last layer of covariance functions [73].In an article published in 2022, Rahlf and colleagues proposed the strategy for a prospective research that would use a deep Gaussian covariance network to investigate the connection among environmental and internal elements contributing to runner injuries.At the time of publication [74], the participant recruitment process for this research was still underway.This ought to deliver real domain data on the forecasting capabilities of an ANN.

K. Radial Basis Function (RBF) Neural Networks
Interpolation of multi-dimensional data is made possible by radial basis functions (see Fig. 12), which work by determining the Euclidean distance that separates each data point from a predetermined center point.These functions can potentially be used inside a neural network as activation functions.[75,76] Radial basis function networks may be used for a range of tasks, including regression and classification.Radial basis function networks can be used.Xiang used an RBF-based neural network to provide injury forecasts in a research that was published in 2021.They categorized the severity of injuries and confirmed their findings by sending questionnaires to experienced coaches [77].Another research published in 2021 suggests using an RBF-based neural network to forecast sports injuries.The danger of harm may be broken down into three categories: low risk, at risk, and high risk [78].Notably, the author attempted to identify the elements that could contribute the most to the overall risk of damage.Although both of these publications have an original concept, are mostly methodological in nature and lack rigorous validation or big data sets.

L. Fuzzy and Neural Network
The idea of fuzzy sets assigns different membership levels to the many components of a so-called fuzzy set.This is in contrast to the "crisp" membership, also known as the dichotomous membership that is expected in classical mathematics [79].According to grey theory, information-free systems are represented by black, and information-rich systems are represented by white.Therefore, the vast majority of actual systems are gray, which suggests that there is missing information.To overcome this issue, several different gray models have been developed [80].At its core, grey theory and fuzzy theory are concerned with statistical analysis uncertainty.In spite of the fact that they are distinct from one another mathematically, the fact that they deal with datasets that are comparable led to their inclusion in the same section.In their research from 2021, Wang and colleagues demonstrate how a fuzzy neural network may be used to determine the severity of an injury sustained during athletic competition.They discovered that the fuzzy neural network performed much better than the Bayesian and Lagrange models.Having said that, this was just a theoretical proposition that used simulated data [81].In another research published in 2021, Zhang et al. suggested a grey neural network.This network takes as input the outcomes of n-grey models and outputs a final prediction using a neural network.Like the last method, this was a theoretical one that was verified and proven using simulation data.Both of these works show exciting potential for merging Fuzzy and Grey theory to deal with the inherent unpredictability in sports injury data, for the framework used in this fuzzy theory is shown in Fig. 13.This is despite none of these papers being applicable in the real world [82].

IV. DISCUSSION
The comparative assessment of various machine-learning techniques used in sports is presented in Table I, in terms of sports, metrics, injury rate and location as follows.V. CONCLUSION There seem to be some problems with using ML as a type of forecast analytics in sports medicine.For example, there aren't enough standard data sets about sports injuries, which makes it hard to test and confirm new ways of modeling.Data is being collected uselessly, especially with the use of monitors that are hard for players to wear.Studies are hard to compare because ML model designs are different and there isn't enough information about how algorithms are built.In some cases, models that are out of date or don't work are used because they are easier to use.For example, logistic regression is often considered an ML algorithm because it can produce a category result.However, it does not change like other ML methods, and current ML algorithms always do better.Surprisingly, logistic regression models, which are old and not considered ML, are still used to make predictions, even though they often don't work well.Many studies that try to identify injuries only use these older methods, which gives the impression that ML has little therapeutic use.This is important because it shows that study into ML's use in sports injuries is still in its early stages and that it could be used in a good way in the future.Possible answers to the problems listed above include making open-source, standard data sets that can be changed to fit the strengths of specific algorithms.The huge amount of data that sports teams and sports casting companies have access to, especially high-quality video clips, could be used to teach CNN in many ways by making big databases.This method would solve two of the problems listed above at the same time.It would give experts a big set of accurate, uniform data that they could use to train and test their models.It would also eliminate the need for athletes to wear faulty monitors to collect data.Another benefit of pose estimation-based prediction is that it will likely lead to generalizability, making tuning networks already trained for different sports easy.Lessening the use of older regression analysis models is another possible option.Logistic regression models can be beneficial, but they often don't work well when they are used to predict sports injuries, which are complicated and involve many factors.We've shown that this is true in the literature as a whole, as logistic regression is a popular model for comparing baselines, as we pointed out when we discussed the latest review piece by Bullock et al.Even though these older models are still instrumental, they shouldn't be confused with machine learning models.Also, current ML models are more likely to be able to solve problems in injury forecasts that are especially hard.Even with the issues mentioned, this area has a lot of promise.By carefully choosing algorithms and putting together enough data sets, experts can try out new ideas and keep pushing the limits of what ML can do to improve sports medicine.

Fig. 1 .
Fig. 1.Analysis of articles published in Scopus for search word "artificial intelligence in sports"

Fig. 2 .
Fig. 2. Scopus articles for search word "artificial intelligence in sports injury prediction"

Fig. 4 .
Fig. 4. Before and after data classification using SVM classifier

Fig. 5 .
Fig. 5. Sample model of decision tree algorithm used in injury detection

Fig. 6 .
Fig. 6.Sample RF model with more DTs used in sports for injuries detection

Fig. 9 .
Fig. 9. CNN classifier used in sports for injury prediction

Fig. 11 .
Fig. 11.Deep Gaussian covariance neural networks algorithm used in sports for injuries detection and prediction

Fig. 12 .
Fig. 12. Radial basis function neural networks algorithm in sports

Fig. 13 .
Fig. 13.Fuzzy and neural network algorithm models used in sports for injuries detection

TABLE I .
REVIEW OF ARTIFICIAL INTELLIGENCE TECHNIQUES IN SPORTS