Predicting the Readiness of Indonesia Manufacturing Companies toward Industry 4.0: A Machine Learning Approach

This research discusses Indonesia's readiness to implement industry 4.0. We classified the Indonesia manufacturing companies' readiness, which is listed in the Indonesia Stock Exchange, in industry 4.0 based on the 2018 annual reports. We considered 38 variables from those reports and reduced them using principal component analysis into 11 variables. Using clustering analysis on the reduced dataset, we found three clusters representing the readiness level in implementing industry 4.0. Finally, we used the decision tree for analysing the classification rules. As the finding of this study, Total book value of the machine is the variable that defined the readiness of a company in industry 4.0. The bigger those values are, the more ready a company to compete in industry 4.0. The other measures, i.e., Total cost of revenue by total revenue; Direct labor cost; Total revenue/Total employee and Transportation cost/Total revenue, will define the readiness of a manufacturing company to transform into industry 4.0. or not ready to transform into industry 4.0.


Introduction
The Industry 4.0' term was publicly introduced in 2011. It was introduced as "Industrie 4.0" by a German group of representatives from different fields under an initiative to enhance the German competitiveness in the manufacturing industry [1]. While, in the third industrial revolution, the production is automated through application of electronics and information technology (IT), the fourth industrial revolution combines them with the latest smart technology [2]. Five technological advances support industry 4.0. Those technologies are internet of things (IoT), artificial intelligence (AI), human-machine interface, robot and sensor technology and 3D printing. Those technologies increased the automation and improved the communication of machine to machine. In the smart technology, smart machines can analyse and diagnose issues without the need for human intervention [3].
Indonesia needs to be well prepared to join the Industry 4.0 era. Therefore, in April 2018, the President of Republic Indonesia, Mr Joko Widodo, launched the roadmap for Industry 4.0. The roadmap is called as "Making Indonesia 4.0". The priority sectors included in the roadmap are food and drinks, automotive, textile, electronics and chemical [4]. 1 Faculty of Industrial Technology, Department of Industrial Engineering, Petra Christian Univeristy, Jl. Siwalankerto 121-131 Surabaya, Indonesia 60238. Email: sea.remus@gmail.com, halim@petra.ac.id. 2  Since it will help to identify the challenges faced. It helps to determine the strategies and policies of the government to encourage the manufacturing sector to adapt to the changes in the industry 4.0. [5].
Most of those studies used assessment tools ( [8,9]; INDI 4.0), model framework and structural equation modelling ([9,10,12]); and therefore, they collected the data by questioners or by forum groups discussion. In this study, we do not assess the readiness of industries using assessment tools. We developed a model to predict the readiness of industry toward Industry 4.0 using machine learning. In this approach, first, we explored the variables used to model the readiness of Indonesia manufacturing companies toward Industry 4.0. As the starting point, we used the five pillars introduced by the Minister of Industry of the Republic of Indonesia-MIRI [13] as the benchmark. Those five pillars are (1) management and organization; (2) people and culture; (3) factory operations; (4) product and service; (5) technology. Those five pillars are similar in certain aspects to the country's readiness benchmark in industry 4.0, which was introduced by the World Economic Forum (WEF). The WEF considers twelve pillars in assessing a country's readiness toward industry 4.0. Those twelve pillars are [14]: (1) Institutions; (2) Infrastructure; (3) ICT adoption; (4) Macroeconomic stability; (5) Health; (6) Skills; (7) Product market; (8) Labour market; (9) Financial system; (10) Market size; (11) Business dynamism; and (12) Innovation capability. By considering the MIRI pillars and WEF pillars, we concern with eight latent variables to measure the readiness of Indonesia Manufacturing companies toward industry 4.0. Those eight latent variables are listed in Table 1. Second, we studied the annual reports of the manufacture companies listed in the Indonesia Stock Exchange (IDX). Those annual reports exhibit indicator variables that we can use to measure the readiness of the manufacturing industry toward industry 4.0; in total, there are 36 variables (see Table 1). Third, we were mining the dataset using machine learning. In this approach, pipelining the dataset is the first step in machine learning, to have a clean and well-prepared dataset. On the clean dataset, we reduced the number of variables using principal component analysis for mix dataset (numeric and categorical dataset). Clustering analysis is the next step, for exploring the levels of readiness of Indonesia's manufacturing companies toward industry 4.0. Given the numbers of readiness level, we perform the decision tree algorithm and validation check for having the classification rules. Using the classification rules, we can predict the readiness of a manufacturer company, given several variables that we proposed in this study.
Our study contributes a rule for measuring the readiness of a manufacturing company in Indonesia toward industry 4.0. Our approaches produce three main findings. First, we defined variables which measure the readiness of a manufacturing company toward industry 4.0 using principal component analysis. Then using clustering analysis, we found three levels of readiness of the manufacturing company in implemented the industry 4.0. Those levels are not ready, in the transition level and ready to transform into industry 4.0. Third, exhibits from the decision tree, the readiness of a manufacturing company toward industry 4.0 can be predicted through these variables: Total book value of machine; Total cost of revenue by total revenue; Direct labor cost; Total revenue/Total employee and Transportation cost/ Total revenue.

Data Set
The dataset was collected from 130 manufacture company's annual reports listed in the IDX. The total variables used in this study is 38 variables; 36 of them are listed in Table 1; the other two variables are manufacturing type. Most of the variables in Table 1 are numeric; only three of them are categorical variables and need explanation. Flexibility in product customization is measured by looking at the possibility of a company produces mass component consistently [15]. This variable has value 0: none, and 1: has flexibility in product customization.  Outsourcing indication is stated in the report if it is existing. Usually, it is stated as maklun, subcontract, outsourcing, tenaga luar and alike. This variable has value 0: none, and 1: indicate outsourcing. Since there are only 10 out of 130 companies which are not flexible, and only fourteen reported using outsourcing in their annual report. We omit those two categorical variables in this study (see Table 2).

Cleaning the Dataset
The essential step in machine learning is cleaning the dataset. Since the data were collected from the annual reports from various companies, not all variables are reported in those annual reports, some of them are missing. The highest percentage of the missing values occurred in the utility expenses (22.31%). However, since utility expenses are highly correlated to the other variables (e.g. direct labor cost, transportation cost), then we can dismiss this variable. Two variables are deleted due to high missing values percentage; they are utility expenses and utility expenses divided by total revenue. Other variables with missing values are listed in Table 3. Those variables are imputed using classification and regression trees -CART approach [16].

Data Reduction
Now in total, we have 34 variables, and it is well known that having many variables are challenging to interpret. Principal Component Analysis (PCA) helps reduce the dimensionality of our dataset and increasing interpretability. Additionally, at the same time, PCA is minimizing information loss [17]. Since, the dataset consists of numerical and categorical data, in this study, we separated the numerical dataset to the categorical one. We then applied the PCA method for the numerical dataset. The PCA is implemented under the R-package so-called psych [18]. The PCA reduces the dimensionality of the dataset from 31 numerical variables into eight variables (see Figure  1). However, the new variables are usually the linear functions of all 31 original variables. Therefore, it is not easy to interpret the meaning of the reduced variables. Several adaptions of PCA have been suggested to more straightforward the interpretation of the new reduced variables while minimizing the loss of variance due to not using the PCs themselves.
There is a trade-off between interpretability and variances. In this study, we followed the simplified PCA using rotation. In the simplified PCA, let be a x matrix, whose columns are the loadings of the first PCs; is the dimension of the original dataset; be a x matrix, whose columns are the score on the first PCs for the observation. Let be an orthogonal matrix, so that = is a x matrix whose columns are loadings of rotated PCs. The matrix is chosen to optimize some simplicity criterion, e.g. varimax criterion. So, matrix is chosen where is the ( , )th element of [17]. The reduced variables resulted from simplified PCA give us variables which measure the readiness of Indonesia manufacturing company toward industry 4.0. We called these variables as readiness variables.

Clustering
Now, we want to find the level of readiness by clustering the dataset based on those readiness variables. We used K-prototype clustering for mix datatype. It combines the K-Means and K-Modes dissimilarities measurements [19]. The prototype itself is a midpoint of a cluster. The dissimilarity measurement of K-prototype can be written as , where , = 1, 2, … , is the observations in the sample, , = 1, 2, … , is the cluster prototype; is the index, the first 1 variables are numeric and the remaining − 1 are categorical. The ( , ) = 0, if = and ( , ) = 1, if ≠ , and () corresponds to weighted sum of Euclidean distance between two points in the metric space and simple matching distance for categorical variables (i.e. the count of mismatches), is the control variable which has to be specified in advance. The larger the value of is the more the impact of the categorical variables. If = 0 then the impact of the categorical variables vanishes, and the K-Prototype becomes Kmeans [19]. Moreover, we used the Silhouette and Dunn index to validate cluster (see e.g [20] for the detail).

Decision Tree
Finally, given the readiness variables of a manufacturing company, we want to predict the readiness level of that company toward industry 4.0. In this step, we used the decision tree under rpart package in R [21]. We split the dataset into 65% training and 35% testing dataset. The training and testing data sets are chosen using stratified random sampling. We used the Gini index and information gain as the splitting criterion. Moreover, we used the confusion matrix to validate the model [22].

Results and Discussions
This section comprises three subsections. We discuss the variables which define the readiness measurement. Then using those variables, we performed clustering analysis to find the number of readiness group in the Indonesia manufacturing company. Finally, given the variables defined in section one, we predict company readiness membership group, which defined in section two.

Readiness Measurement of Indonesia Manufacturing Company toward Industry 4.0
Applying the simplified PCA we found eight reduced variables (see Figure 1 Table 4. Additional to these variables, we add three categorical variables: (TM) Type of manufacture; (S) Sectors; (A) Awareness of industry 4.0. We use those eleven variables to measure the readiness of Indonesia manufacturing company toward industry 4.0.

Defining Number of Readiness Group
Here, we conduct a clustering analysis using K-prototype clustering for mix datatype. The K-prototype is a supervised algorithm, means; we do know the K or the number of clusters before running the algorithm. The number of optimal clusters usually is deduced using the scree plot. The scree plot, plotting the distance between clusters vs the number of clusters. Distance between clusters can be computed using, for example, total within sum square, Dunn index, Silhouette (see [20] for detail). Dunn index and Silhouette are suitable for mix datatype. The Silhouette scree plot shows that the optimal number of readiness groups is three (see Figure 2).
We defined those three groups as 1: not ready to transform into industry 4.0; 2: in the transition toward industry 4.0; 3: Ready toward industry 4.0. The statistics summary for each group is listed in Table 5, 6 and 7. The variables that differentiated the mean values of each group are Number of higher education (at least Bachelor's degree) employees; Direct labor costs; Total revenue/Total employees; and Total book value of the machine (see Table 8 for the result of the Anova testing).    (14) Pharmacy (8) Food & beverages (12) Cables (6) Automotive & components (6) Others (17) Sectors Basic & chemical industry (0) Various industry (31) Consumer goods industry (32) Awareness toward industry 4.0 Not aware (37) Aware (6) Aware & using (20)

Classification using Decision Tree
Here we used the decision tree to classify the readiness membership of a company. The decision tree is formulated from the given eleven variables which we defined in section A. To validate the result; we split the dataset into training (65%) and testing (35%) datasets. The models were built using the training dataset, and then we validated the model into the testing dataset. We used stratified random sampling to select the member of each dataset. The splitting composition is given in Table 9.
We used the tree partition algorithm and the GINI index, which measure the miss classification of the dataset. Gini index is formulated as follow, where ( k) is the probability of an outcome from class is correctly classified as class ; is the class, = 1,2, … , ; is the total number of classes (Ledolter, 2013). The resulted decision tree is depicted in Figure 3, and from this tree we have decision rule to classify each company in the three-readiness class toward industry 4.0.  The classification rule for Class 3 is simple since, in this study, we only found six companies which are classified in this class. This situation become the limitation of this study. The confusion matrix of the decision tree using Gini index is given in Table 8, and this table shows the precision on Class 1 is 0.74; Class 2 is 0.75 and Class 3 is 1 (see Table 10). In average, the decision tree can predict the readiness of a company toward industry 4.0 correctly 76%.  predict the readiness of that company using decision tree rule. The precision of this rule is 76%.