Artificial Intelligence Based Deep Bayesian Neural Network (DBNN) Toward Personalized Treatment of Leukemia with Stem Cells

—The dynamic development of computer and software technology in recent years was accompanied by the expansion and widespread implementation of artificial intelligence (AI) based methods in many aspects of human life. A prominent field where rapid progress was observed are high‐ throughput methods in biology that generate big amounts of data that need to be processed and analyzed. Therefore, AI methods are more and more applied in the biomedical field, among others for RNA‐protein binding sites prediction, DNA sequence function prediction, protein‐protein interaction prediction, or biomedical image classification. Stem cells are widely used in biomedical research, e.g., leukemia or other disease studies. Our proposed approach of Deep Bayesian Neural Network (DBNN) for the personalized treatment of leukemia cancer has shown a significant tested accuracy for the model. DBNNs used in this study was able to classify images with accuracy exceeding 98.73%. This study depicts that the DBNN can classify cell cultures only based on unstained light microscope images which allow their further use. Therefore, building a bayesian‐based model to great help during commercial cell culturing, and possibly a first step in the process of creating an automated/semiautomated neural network‐based model for classification of good and bad quality cultures when images of such will be available.


INTRODUCTION
Leukemia is a type of cancer that affects the blood and bone marrow. The bone marrow of Leukemia patients exhibits this rapid, unchecked proliferation of abnormal cells according to [1]. Unlike other diseases, Leukemia usually does not form a mass (tumour) that can be detected by imaging tests such as X-Rays. It is made by developing blood cells in the Bone Marrow. Hematopoietic Stem Cells are progenitor cells of all blood cells and undergo several developmental stages before they become adults. Blood cells proliferate and divide in the bone marrow to make Red Blood Cells, White Blood Cells, and Platelets explained in [2]. However, when a person suffers from Leukemia, one of these types of blood cells begins to develop rapidly and out of control. These abnormal cells, known as Leukemia cells, arrogate the space inside the Bone Marrow. Leukemia can now be identified by automatic specialised tests stem cells, such as Cytogenetics Cell, Immunophenotyping Cell, and Morphological Cell Categorization, but the drawback is that they necessitate expert operators to scrutinize microscopic pictures of blood or bone marrow, which also leads to a substantial delay in the treatment procedure as per the ref. [3]. Another issue is that these approaches are not employed in cases with regular symptoms and are therefore performed only in exceptional cases. The leukemia stem cell (LSC) in leukemia is by far, one of the most extensively studied cancer stem cells (CSC) and has served as a model system in the study of cancer development according to [4]. Intuitively, the properties of CSC are linked to patient outcomes. Since LSC drive its onset, progression, and recurrence from residual cells post therapy, their properties should therefore be intimately linked to patient outcomes, yet to date, the clinical relevance of LSC has not been firmly established according to [5,6]. The management of adult leukemia patients has been largely guided by baseline clinical parameters including age at diagnosis, white blood cell (WBC) count, whether disease presentation is de novo, treatment-related, or secondary to prior disease. Additionally, an ever-expanding set of specific karyotype and gene mutations are being incorporated into patient risk stratification schemes, producing broadly favorable, intermediate, and adverse patient risk groups as described in [7,8]. This standard treatment is clearly not curative for the vast majority of leukemia patients as evidenced by the overall survival rate of < 50% according to [9]. Thus, potent anti-LSC therapies should be tested for their ability to induce deeper and more durable remissions in human leukemias. Labor-intensive approaches for identifying LSC-specific targets could be envisaged, involving large-scale high-throughput screens of drug/compound libraries, or the systematic perturbation of genes (i.e., knockout (KO), knockdown (KD), or overexpression (OE)) as mentioned in [10].
Leukemia has an incidence rate of ~1 in 100,000 individuals age < 65 years, and ~12 in 100,000 individuals age > 65 years. Prognosis has improved in more recent years due to advances in supportive care, but remain poor overall, particularly for patients of age > 65 years, where up to ~70% succumb to AML within one year from diagnosis according to [11]. Although ~70% of patients achieve complete remission (CR) from initial induction chemotherapy, ~50% will also experience relapse. It is suspected that a rare subpopulation of leukemia stem cells (LSC) is insensitive to standard treatment, allowing them to survive and eventually drive relapse. LSC properties may be closely linked to patient outcomes, and that relapse prevention is likely achievable through targeting the leukemia's compartment. LSC must therefore be able to maintain or increase their numbers through quiescence and self-renewal, while also sustaining a population of non-CSC cells that comprise the bulk of the tumor population as stated in [12]. The tools and software libraries that can be utilize for the purpose of research on leukemia can be seen in the Fig. 1. The biological variables are determined using tests such as tumor marker tests, gene-based diagnostic tests, liquid biopsy, robotic biopsy, and imaging techniques and devices (x-ray, MRI, ultrasound and CT scans), virtual endoscopy according to [13]. Many imaging tests involve high doses of radiation. The initial risk stratification is: T-cell, infant, high risk B-precursor, and standard risk B-precursor leukemia stem cells (LSC). After early treatment, patients are further stratified that acts as a surrogate to outcome prediction. The main treatment for leukemia with stem cells involves chemotherapy; targeted therapy may be given to leukemic patients with the Philadelphia chromosome; radiation therapy may be used as part of treatment for leukemic patients to prevent the spread of leukemia to the central nervous system according to [14]. A stem cell transplant may be offered to people with leukemia while they are in remission. Supportive therapy is given to treat the complications that usually happen with treatments for leukemia and the disease itself as per [15].
Computer assistance model using different techniques like Deep Bayesian Neural Network (DBNN) is used in the predictive algorithms that helps in personalized treatment on the combination of clinical and biological variables and it is also used while creating images for leukemic patients as stated in [16]. In virtual endoscopy (a method used to diagnose cancer), computers create a 3-D view of the organ created from several imagesdoctors use this 3-D view for further diagnosis using the AI-based techniques for better success rate chances according to [17]. The growing importance of deep-learning methods, and especially Deep Bayesian Neural Network (DBNN) in natural sciences was observed in recent years. DBNN was successfully used for localization and classification of bacterial cells and mitosis detection according to [18] classification of differentiating neural leukemia and stem cells differentiated into epiblastlike cells, pathogens detection and identification, but also in agriculture for pests or pathogens detection based on field images. BDNN used in these studies were able to classify images with accuracy exceeding 90%. In not that distant future according to [19], BDNN-based technology may at least partially replace qualified diagnosticians in the interpretation of medical images.

A. Problem Formulation
In this research paper, personalized medicine of leukemia with stem cells means tailoring medical treatment based on each person's genetic composition within diseased cells, here, leukemia's cancerous cells with the depiction of a novel computer assisted technique Deep Bayesian Neural Network (DBNN). The genetic material in cancer cells differs from that in healthy cells. Also, two people with the same type of cancer may differ in their cancer cells. These differences are in the form of gene changes or in the levels of certain cell proteins that work as messengers within and between cancer cells. Testing for the presence of such changes leads to the development of targeted drugs that block pro-cancer messenger proteins according to [20]. The differences in cancer cells in each individual calls for personalized treatment. Absence of a particular type of mutation in cancerous cells, means that the drugs that target those mutations do not work. Leukemia can affect both young people and adults. In general, up to 20% of patients with a high risk of relapse are not cured. Age is an important factor affecting prognosis (outlook of the patient). About 90% of patients survive for 5 years or more if diagnosed aged 14 or younger, whereas the survival rate drops drastically to just 15% for those aged 65 or older. Understanding this modularity and generality of AI-augmented model can be beneficial for leukemic patients.
i. Absence of accuracy in classifying cancerous and noncancerous cells but also useful uncertainty information regarding the predictions by model.
ii. How the potential usage of statistics and Artificial Intelligence (AI) in fields, such as medical science, together with the combination of probabilistic modelling through Bayesian approach is the most efficient mean of ensuring AI safety since it provides information about uncertainty in leukemic patients.
iii. Lack of AI-assisted Bayesian interpretation model for the personalized treatment of leukemic patients.
iv. Inferring on probabilistic modelling requires theoretical efficiency and high computational effort, so in that case the modelling must be entertained with the prescribed technique for leukemic patients.
High-throughput methods in biology that generate big amounts of data that need to be processed and analyzed. Building a DBNN-based model that can precisely predict the differentiation status would be of great help during commercial leukemia stem cell (LSC) culturing, and possibly a first step in the process of creating an automated/semiautomated DBNN-based model for classification of good and bad quality cultures when images of such will be available. How the DBNN-based model can be trained with simple light microscope images and then used to accurately predict the quality of stem cell cultures in independent, new samples. Therefore, these AI methods are

B. Contributions
The aim of the current study is to derive a computer assisted DBNN-based model for processing and classification of leukemia with stem cell data that can be used in automated/semi-automated quality control for the personalized treatment of leukemia patients. Due to the previously mentioned lack of data for low vs high-quality classification, a data set with early vs late differentiation was used as a proxy data set when testing the in the results section.
We intend to automate stem cells classification with leukemic patients by using DBNN-based model on their genetic profiles-their genotypic and phenotypic gene expression level data through multiple graphs. Our objective in this research is to develop a DBNN model that can classify with lower error rate with high certainty. As we already know that some tasks can be performed better by human compared to a computer while it is the opposite for some other tasks.
The paper demonstrates effectiveness of DBNN approach on leukemia with stem cell data, which has been frequently focused on in the application of automated/semi-automated quality control for the personalized treatment of leukemia patient. Additionally, we give a viewpoint into the insights of training and validation of DBNN model for the treatment of patients suffering from the disease. This research will also highlight the statistics for the medical science, together with the combination of probabilistic modelling through Bayesian approach as discussed.

C. Background
The knowledge about the future outcomes can also be expressed in terms of uncertainty. From the statistical point of view, for a classification problem, uncertainty is a bound formed by probability values about an outcome. Statistics has a wide spectrum of applications in many fields. Recent years have been the juvenile period of integrating statistics and artificial intelligence to solve real-world problems ranging from finance, medicine, business, life science to environmental science as mentioned in [21]. Important decisions are required to take in these fields to achieve the intended goals. Deep learning has become a vital tool for making predictions about the future outcomes in order to make better decisions. In a classification problem, a deterministic Deep Learning model produces a single probability, also known as a point estimate given a single observation of the independent variable(s). For instance, in cancer diagnosis leukemia, the frequentist approach in Deep Learning model will generate a single probability value about whether a person has cancer or not and if a doctor prescribes medicine based on that single probability value, the value which could be associated with the type-I error in leukemia with stem cell data, the consequence could be devastating according to [22]. The classes of leukemia are demonstrated in the Fig. 2.   Fig. 2. Types of leukemia classes prediction [21] In contrast, a Deep Bayesian Neural Network (DBNN) model generates a distribution of outcomes given the input data. This probabilistic modeling in supervised Machine Learning or Deep Learning produces the confidence bound for each outcome of dependent variable which can be utilized to make realistic and effective decisions for treating the leukemia with stem cell data in the medical field according to [23]. The point estimate-based prediction in deterministic Deep Learning models does not give any indication about the possible variation in outcomes that could happen in reality. This lack of information about uncertainty in predictive modeling can lead to unwanted situations. In a similar situation, a Bayesian Deep Learning model will generate a probability distribution over a possible outcome, from which it would be possible to estimate the mean and variance as stated in [24]. In general, in a classification problem, we treat the mean of predicted probabilities for leukemia with stem cell data as the predicted outcome and the variance tells us about the uncertainty of the outcomes in any given situations. In other words, the distribution illustrates the confidence bound about the outcome given the input data. From that confidence bound, we can observe how confident the model is in making each of the predictions for leukemia with stem cell data explained in [25].
The Bayesian evidence framework for leukemia and model comparison as well as to obtain the weight decay coefficients in an efficient way. In Bayesian analysis, uncertain quantities are expressed in terms of posterior probability distributions. Current leukemia cancer therapy is risk-based therapy according to [26,27]. The patient is assigned to a certain risk-group and is treated based on the treatment provided to that risk-group. The current approach to risk-stratification of leukemia relies on predictive algorithms using the Artificial Intelligence (AI) that use a combination of clinical and biological variables. Some of the clinical variables used are gender, age and white blood cell count at the time of diagnosis according to [28]. The purpose of using Deep Bayesian Neural Networks (DBNN) is to integrate the traditional NN with probabilistic modeling since probabilistic modeling takes into account the uncertainty for the patient's treatment. Besides that, Bayesian approach also provide the following advantages as per [29].
We can explain the regularization as opposed to traditional NN. We can compare different models for treatment (e.g., models with different layers or prior Early detection of Leukemia symptoms in individuals can considerably improve their chances of survival. The approach of stem cell observation employing cytogenetics and immuno-phenotyping diagnostic procedures is now recommended for its high accuracy as mentioned in [30].

II. METHOD
This study aimed at developing a deep Bayesian and neural networks (DBNN) -based model for personalized treatment and classification of leukemia with stem cells and image data that can be used in automated/semi-automated quality control of leukemia for better computer assisted treatment. DBNN allows classification of images at the pixel level by assigning each pixel in the image to an object class and not by focusing on certain features of the image like it was in conventional machine learning approaches, where e.g., user had to first mark the features that should be recognized by a computer. DBNNs are already able to classify images with precision similar or often better than human eyes. DBNN building and training were performed in Python 3.8.5 in the Anaconda 3 environment using Keras with TensorFlow 2.3.1 as a backend. Keras is a deep learning application programming interface running on top of the machine learning platform TensorFlow as mentioned in [31]. The flowchart in Fig. 3, represents the step-by-step process.

A. DBNN Model Integration and Proof of Concept
The difficulty with such procedures is that they are slow and unstandardized since they depend on the operator's capabilities and enervation. The use of microscopic pictures to identify leukemia in human blood samples is only fitting for low-cost and remote diagnosis systems [32][33][34][35][36]. That is where new-age solutions come into play. Using deep bayesian and neural networks various researchers have developed systems that provide a smooth and exceptionally accurate way to detect and classify different types of blood cancer. Thus, the process of bayesian deep neural network and the block of the model can be seen in Fig. 4 & Fig. 5.

1) Pooling layer:
We deploy pooling which is a downsampling technique to capture the spatial invariance properties of the data. Suppose, the shape or position of a leukemia on different images may differ and the network can get confused or miss some key information about that tumor in such situations [37][38][39][40][41][42][43][44][45][46][47]. Pooling operation tries to assure that the NN does not miss any important information about the data. There are several types of pooling operations, such as mean. pooling, sum pooling etc. Max-pooling was applied in this study due to its speed and improved ability to merge in comparison to other methods, such as the average pooling and the L2-norm pooling [48][49][50][51][52][53][54][55][56][57][58][59][60]. These feature maps in a Bayesian deep layer are called pooled feature maps by [28]. The performance of the model was then verified using a validation dataset and evaluated based on classification metrics (accuracy and loss). DBNN model based on the TensorFlow tutorial (Fig. 6 & Fig. 7) was further tested using different settings of hyperparameters and verified with the independent test dataset. Preparing such a library requires the involvement of highly qualified specialist screening through image collection and extracting features typical for each class that will be classified during the particular study. Moreover, the classification accuracy of these models was often not satisfactory and rarely exceeded 80%. The available hyperlink of the dataset is (https://www.kaggle.com/datasets/andrewmvd/leukemiaclassification).  3) Deep hierarchies: The Bayesian operation assists in portraying the spatial information of the input data and a DNN can provide very efficient results on such task. Because of its architecture and functionality, it is especially useful in solving problems that are difficult to solve with other machine learning and deep Learning models. DBNN works well when a dataset contains special features. For instance, a time series data contains seasonality, a colorful image accommodates different pixel value across all the pixels where each of the pixels represents a certain characteristic. Due to this fact, DBNN is applicable to unravel questions that are related to time series, image, video and text data. For instance, we can apply DBNN to MRI scan image data to classify the patients with Leukemia.

B. DBNN Based Leukemia Classification with Data Augmentation
The frequentist approach of DBNN augmentation is associated with assigning a single layer output on each connection between the segmentation/classification. However, in the Bayesian approach, we assign an augmentation over each block and this probabilistic modeling returns estimates of uncertainty for the treatment. Initially, we assign a distribution of prior over each weight and based on posterior inference of blocks we come up with a decision about the weights. That is, Bayesian learning of weight means how we rationalize our belief regarding the weights/parameters from prior knowledge to the posterior after observing the augmented data. However, DBNN hold thousands even millions of parameters. In such cases, typically the models are complex and dimensions are much higher than we can even imagine with multiple key colors.

C. Training and Cross-Validation
A Bayesian deep learning model is built for each of the training set samples and is tested on the test set to initially produce a vector of regression values of cancer. The threshold that separates good outcomes from poor outcomes is then manually set to produce a vector of 0's and 1's. The loss/accuracy of the Bayesian model is evaluated by comparing the predicted labels with the original labels and by building confusion matrices. Methods to improve the generalization of the model and to improve the accuracy, such as Out-of-cross validation and epoch voting are adopted. The Perceptron algorithm, also called the single-layer perceptron, is one of the earliest supervised trainings/cross validation algorithms for binary, linear classifiers. It is a single layer neural network. The tested model that was based on the TensorFlow tutorial proved to have the highest validation accuracy from all tested architectures, as high as 97%, and independent test accuracy around 96%. Based on the values of classification metrics, this third model was used for further testing of different settings and hyperparameters.
A Bayesian network of 4 layers followed by deep learning produced 90% accuracy, whereas DBNN followed by DNN produced 84% accuracy. The proposed dataset had 22283 features and the DBNN had the distribution 0.2-1.6 and the epochs between 0-40. In terms of network structure, a DBNN is identical to an CNN, but the two are entirely different with respect to training as (accuracy vs epochs). The difference in the training method enables DBNN's to outperform their shallow counterparts. An CNN is trained by backpropagation, while a DBNN is trained by pre-training and cross validation. Hidden layers, and hence, deep learning is necessary when dealing with highly contextual and complex datasets. Datasets that would fall under this category include image recognition datasets and medical datasets that quantify the cellular interactions within organisms. Datasets with features in the form of discrete values (example, blood group, race,  CNN and SVM (support vector machine), decision trees may not be able to unearth the intrinsic patterns within complex data for leukemia according to [28]. DBNN employs non-linear transformations at each layer to extract these intrinsic patterns.

III. RESULTS AND DISCUSSION
In this part we present the results, the definite trial settings, the assessment metric, as well as the outcomes on every particular task. Our implemented Bayesian deep approach in NN provides this vitally important information that was missed before in the treatment. By using DBNN approach, the whole image is convolved with a set of filters and the feature maps are generated. The weights of these filters can be changed during the training process to optimize the classification process. The output of the convolution layer is subsampled and then transferred as an input to the next layer. The predictive uncertainty generated by our models in DBNN set up extracts the information regarding the confidence bound w.r.t each prediction. This information is of vital importance since this tells the practitioner about the probable variation in outcomes that could happen in reality for the leukemia with stem cell data. The segmentation and classification of different classes of leukemia sample can be seen in Fig. 8. The DBNN obtained an accuracy of 98.73% over that of 89.40%,79.58% for an CNN and SVM. An autoencoder with 3 hidden layers was used with a softmax regression layer on neuroimaging data to predict leukemia types. The performed binary and 5-class classification of leukemia-the 5 classes being: basophil, eosinophil, lymphpcyte, monocyte and lastly, neutrophil.
Images of leukemia classes were obtained by using a fluorescence microscope (magnification 10x), with an ANDOR Zyla sCMOS digital camera and then processed using NIS-Elements software package (version 4.30). Images were saved as jpg files. Each image was labeled as 'early differentiation' (ED), for images taken during the first 14 days of cell culturing or 'late differentiation' (LD), for images taken on the 16th day of culturing and later (Fig. 9). Images, however, were not taken daily for each experiment, due to e.g., weekends. Therefore, lack or very few images were available for some days of culturing. It is plausible that the deepest network will give us the most accurate results by using the confusion matrix. For this purpose, we aim to maximize the parameters of our Bayesian Deep Learning models. Also, we conduct experiments to decide which hyper-parameters to employ, and then find the optimal values of those hyperparameters (confusion matrix example for class-wise treatment) (Fig. 10). Datasets used for leukemia were split into training, validation, and independent test sets in a ratio of 7:2:1. To improve the efficiency of training, the number of images was increased by random flipping (vertically and horizontally) and random rotation. The results of training of DBNN using a training set were used for backward propagation, aiming to optimize the personalized treatment for leukemia and doctor´s ability to properly classify the images, by adjusting weights of neurons inside a neural network. Moreover, Keras built-in visualization of a training set learning accuracy and loss was used to assess over-fitting (Table 1)

IV. CONCLUSION
The research proposes a system that has the ability to help in personalized automated technique of Deep Bayesian Neural Network (DBNN) for the treatment of leukemia using microscopic pictures of blood and stem cells and achieve accuracy levels that surpass that of practicing physicians. The healthcare system can be significantly benefitted from this technology as it can provide access to medical imaging knowledge in sparsely populated and remote locations of the world that have fewer medical personnel. The proposed setup of technology in this research advises a method for segmenting-stained blood smear and stem cells using Bayesian Neural Network Clustering under Image Processing to retrieve the nucleus region and cytoplasm area. The ultimate goal would be to evaluate the model using the images of leukemia with stem cells of good and bad quality. Such a model would significantly contribute to the improvement and reinforcement of high-performance systems for future stem cell-based applications. However, such models need greater computational power of specialized computational hubs but are also often able to recognize features that would be impossible to identify by the human eye.