Forecasting of The Number of Schizophrenia Disorder by using The Box-Jenkins of Time Series Analysis

Schizophrenia is a mental disorder with a complex brain disorder that causes sufferers not to be able to distinguish between reality and imagination. This study aims to determine the parameters for the best model Box-Jenkins time series analysis in predicting the number of schizophrenic in Cirebon City in 2018 as seen from the smallest MSE (Mean Square Error) value. This study uses Box Jenkins method (often referred to as the ARIMA Method) with documentation collection techniques and literature studies. The documentation aims to collect data on the number of schizophrenic patients in January 2014 to December 2018. Data were analyzed in several stages, namely the stage of data stationary identification, determining model parameter estimates, model verification and forecasting. The results of this study show that the best model for predicting the number of schizophrenic patients in the future is ARIMA (0,1,1). The forecasting results of the number of schizophrenic patients in Cirebon City from January to December 2018 respectively are 69 people, 68 people, 68 people, 68 people, 68 people, 68 people, 67 people, 67 people, 67 people, 67 people, 67 people, 67 people.


I. INTRODUCTION
Mathematics comes from the latin language "mathema" or "manthanein" which means learninf or things that are learned, whereas in the Belanda language it is called "wiskunde" or definite science. So epistemologically the term mathematical means the knowledge obtained by reasoning [1].
Mathematics is one of the necessary knowledge in human life [2]. Many things around us are always related to mathematics. Finding someone's home number, call, and many more. Along with the times. Mathematics develops and is present as fundamental and needs to be studied in every discipline.
Problems often arise in the community life was often in need of a settlement with the mathematical sciences such as issues of government today were surprised by the wide range of issues beyond the allegations, such as: disaster al am, as well as the increasing number of patients with some types of diseases such as schizophrenia disorder that currently the community is still unfamiliar with the disorder, the public is more familiar with "crazy" or "mental disorders" for that the government must be an emergency response in dealing with various types of problems.
Mental disorder is a collection of abnormal conditions, both physical and mental, which creates obstacles to sufferers in adjusting and behaving with the environment. Patel says mental disorders have many kinds, common psychiatric disorders such as depression and anxiety, severe psychiatric disorders such as schizophrenia and acute psychosis. The author will only discuss one of the mental disorders mentioned above, namely schizophrenia.
Schizophenia is a severe and long-lasting psychological disorder characterized by the presence of disturbed thought processes. Schizophrenia involves the rupture of an individual's personality from reality and not the appearance of personality in one individual [3].
According to Riskesdas there are 0.17% of Indonesia's population who experience schizophrenia or there are more than 400 thousand people who experience these mental disorders. The highest prevalence was found in the province of Yogyakarta and Aceh is equal to 0.27, the lowest prevalence found in West Kalimantan province that is equal to 0.07, and there are 12 other provinces which have a prevalence of severe mental disorders exceeding the national figure of 0.17% [4].
One branch of mathematics that discusses the problem above is statistics. Statistics is one branch of mathematics. Whether we realize it or not in this era, humans do research using statistics. Decisions taken in research use statistics. Based on this, it is clear that statistics is a mathematical science that is very much needed in conducting research, and decision making. In the branch of statistics there are many methods found one of which is the forecasting method.
Forecasting will generally be based on past data that has been analyzed by using a particular method. The uncertainty factor is one of the things that will be found in this case, in minimizing forecasting errors, one method is used, namely the Box-Jenkins time series analysis forecasting method. The time series model used is AR, MA, ARMA and, ARIMA. This modeling approach is particularly useful when the little knowledge is available on the underlying data [5].
This study looks for the appropriate parameters for the best model in the Box-Jenkins time series method in predicting the number of people with schizophrenia in Cirebon City viewed from the smallest MSE value, and how the results of forecasting.  The time series method  is a prediction of future values based on the past values of a  variable. The time series data are the values of a variable  sequentially according to time such as: day, week, month, and year, in this study the data used is data on the number of people with schizophrenia disorders from January 2014 to December 2017.
To apply the concept of this prediction held research in Cirebon City Health Department. This research was conducted to obtain data on the number of schizophrenic in the previous year which will be forecasted for the coming year, so that it can become a reference for the health department in making counseling programs in handling schizophrenic disorders.  [7].
Green conducted a research on the analysis of time series stock prices using the Box-Jenkins approach and found that stocks do not have certain behavior based on existing industries, their behavior is more related to certain companies and how much stock prices are influenced by factors that cannot be measured [8].
Sutarti conducted research on textile production forecasting at PT. Primatekno Indonesia Batang Regency using the ARIMA model and it was found that the best model was ARIMA (1,1,1) with the smallest MSE value of 104860 with forecasting results from January to December of 2009 respectively 3143. 28 [9].
Our proposed study was to find out the parameters for the best model in predicting the number of schizophrenic in the future as seen from the smallest MSE value.

III. RESEARCH METHODS
The research method is the method used by researchers in collecting research data [28]. To get a comprehensive study result, it is necessary to use the right research method, therefore the researcher uses the Box Jenkins method. The model proposed by the researcher can be seen in Figure 1. The first step carried out by researchers was to collect data on the number of schizophrenic patients for the past four years obtained from the Cirebon City Health Office, after researchers obtained the next step of the data processing data which included four stages, the first stage was identifying data consisting of stationary test data variance and means. Stationary means that there is no drastic change in the data or data fluctuations are around a constant average value [29]. If there is a stationary stage or not, if this data is stationary or not, if the data is stationary then proceed to the next stage, but if the data is not stationary then the cox box transformation process will be carried out in the variance stationary test using the formula [30]: and differencing ( ) on means stationary test using the formula: After the data is declared stationary, the next step will be sought for a suitable temporary model and considered in accordance with the data the researcher obtained. The model proposed by the researcher is the ARIMA model (p, d, q) which is a modification of the ARMA (p, q) model by entering operators to differentiate so that the data used meets stationary conditions. The general form of ARIMA (p, d, q) is stated as follows: ARIMA (0,1,1) time series models are: The second stage is the parameter estimation test which includes the parameter significance test with the hypothesis: The third stage is verification test of the model consisting of residual independence test and residual normality test. The residual independence test is done by analyzing the results of the residual ACF and PACF graphs produced by the model, if the lags in the residual ACF and PACF plot pairs do not cut, the residuals are independent (uncorrelated) [31], apart from looking at the residual ACF and PACF plots, it can also be carried out by the residual simplicity test by comparing the value of P-value to the output of the Ljung-Box-Pierce process [32]. Ljung-Box-Pierce Formula: Normality test of residuals is done by clicking histogram plot analysis, a data is said to meet the residual normality test if the histogram plot is in the form of a normal curve. If a decent model is used for more than one forecasting, then to choose the best model it will be seen from the smallest MSE value with the formula: The last stage is the forecasting process, at this stage the best model has been obtained, then it will be forecasting the number of schizophrenic patients in Cirebon in the future.

IV. RESULT AND DISCUSSION
Retrieval of data in this study was conducted at the Cirebon City Health Office. The data taken for analysis is the number of people with schizophrenic disorders from January 2014 to December 2017 ( Figure 2).
To identify the data has been stationary against variance or not, the box-cox test is performed. The box-cox test results for the data in Figure 2 can be seen in Figure 3. Note that the data has not been stationary against the variance because the value of its rounded value, so it needs to be transformed according to the value of its rounded value, because the value of its rounded value (lambda) in Fig. 3 there was 0.50, the data in Figure 2 will be transformed by the formula with is data actual. The results of the cox box transformation using the formula in equation (1) can be seen in Table. 1 and Figure 3, based on the results of the box-cox transformation it can be seen that the value of rounded value (lambda) is 1 to the variance so that it can be concluded that the patient's data is stationary. Identification of stationary data means can be seen from the plot of actual data and trend plots then followed by plots of pairs of autocorrelation function (ACF) and partial autocorrelation functions (PACF). The actual data plot and trend of the number of people with schizophrenia disorder are shown in Figure 4. Based on the trend graph plot it is known that the number of schizophrenic patients has decreased and the actual value is still far from the linear line, so this trend includes data that is not stationary, to be able to clarify the stationary data can be seen in Figure 5. Based on the plot pattern of the autocorrelation function (ACF) and the partial autocorrelation function (PACF), it is known that there is not even one lag outside the dashed line. This shows that in the data there is no Autoregressive (AR) process nor the Moving Average (MA) process so that it is stated that the data is not stationary. Therefore, the necessary process of differentiating or differencing to help in to stationary the data in Figure 2. Differencing is the process of counting change or difference in an actual data by subtracting the previous data by using equation (2). The results of differencing the first difference can be seen in Table. 2. After obtaining differencing results, step next is to make a plot again using the data in Table. 2. Based on Fig. 6, it can be seen that the first difference data is stationary, because the average number of schizophrenic patients does not move freely at a time and has an actual value that has approached a linear line. The next step is to analyze Figure 7 to identify the model that is suitable for forecasting. The ACF plot results in Fig. 7 shows that the value of autocorrelation in lag 1 is outside the significant boundary is -0.51 and the lag 2 value is close to zero, namely 0.01, and the PACF plot in Fig. 7 forming sine waves. This means that the data has a moving average (MA) process with the order 1 so that it strengthens the formation of q = 1.
Based on the results of the PACF plot in Fig. 7, it can be seen that the graph is in lag 1, lag 2, and lag 3 is outside the significant boundary and the ACF plot decreases exponentially to 0. So that there is a 3 order autoregressive (AR) process to strengthen the formation of p = 3.
Based on the results of model identification through ACF and PACF plots, it can be seen that there is a moving average (MA) process with the order of 1 or q = 1, the autoregressive (AR) process has a order of 3 or p = 3, and differencing d = 1, so the probability of a model ARIMA (p, d, q) formed is ARIMA (3,1,1), (3,1,0) and (0,1,1).
Based on the results of data stationary identification, the temporary model is known, the next step is to calculate the estimation of the model. The parameter estimation H for each model is in Table. 3.  Table. 3, it is known that all ARIMA models have values | | > 2 ( −1) = 0 rejected, so all three ARIMA models have significant parameters so that they can be continued at the model verification stage.

Model
The model verification phase used to test the feasibility of the model is done using two tests, namely the residual independence test and the residual normality test. Residual independence test was seen from the ACF plot pairs and residual PACF plots and value comparisons P-value Ljung-Box Pierce using equation (7), the hypothesis: P-value then the ARIMA model meets the residual independence test. While the residual normality test is carried out by analyzing the histogram plot so that the model is suitable to be used for forecasting. The verification results of the model can be seen in Table. 4. Based on the results of verification mode l in Table. 4, it can be seen that there are two ARIMA models that satisfy the residual independence test requirements and the residual normality test, so to choose the best model, the smallest Mean Square Error (MSE) from the two models, ARIMA (3,1,0) and ARIMA (0,1,1). The output of the MSE value of the ARIMA model can be seen in the Table. 5 using equation (8). Based on Table. 5 MSE the ARIMA model (0,1,1) is smaller than the other ARIMA models, so for the next forecasting phase use the ARIMA model (0,1,1). Verification result ARIMA (0, 1,1) can be seen in Figure 8. boundaries of the residual correlation, so it can be concluded that the ARIMA model (0,1,1) is suitable for the next stage, namely the forecasting stage. In the residual reliability test, compare the P-value value at the Ljung-Box process output with a tolerance level of 5%. Based on the data in Table. 4 the L-jung Box value for the ARIMA model (0,1,1) in the 12 residual lag fulfills the random process because of P-value (0,926 > 0,05), as well as 24, and 36 lags, so it can be concluded that the residual reliability test satisfies the random process, then 0 accepted and the ARIMA model (0,1,1) is suitable for use. In Fig. 9 It can be seen that the histogram plot has formed a normal curve because it follows the direction of the diagonal line or histogram graph, so that the histogram plot meets the normal assumptions. Based on all the tests that have been carried out, the ARIMA model (0,1,1) has fulfilled the independence test requirements and residual normality test so that the ARIMA model (0,1,1) is suitable to be used for forecasting. After obtaining the best model, namely the ARIMA model (0,1,1) for forecasting, the next step is the formation of the ARIMA model equation (0,1,1) based on the results of inputting the following parameter estimates in Table 6. The equation of the ARIMA model (0,1,1) uses equation (6) that is = + −1 − 1 −1 with parameters value is = −0,192 and 1 = 1,0308,So that the equation of the ARIMA model (0,1,1) is = −0,192 + −1 − 1,0308 −1.
The Results of forecasting data on the number of schizophrenic in January to December 2018 were 69 people, 68 people, 68 people, 68 people, 68 people, 68 people, 67 people, 67 people, 67 people, 67 people, 67 people, 67 people. Based on the forecasting results, it can be explained that the data on the number of schizophrenic disorders in Cirebon City occurred in the month of January 2018, which is estimated at around 69 people. Forecasting the number of people with schizophrenia has decreased with increasing time. It can be caused due to the trend of the original data value that decreases causing forecasting value also decreased. The forecast number of schizophrenic can be seen in Figure 10.