Bayesian-based Project Monitoring: Framework Development and Model Testing

During project implementation, risk becomes an integral part of project monitoring. Therefore. a tool that could dynamically include elements of risk in project progress monitoring is needed. The objective of this study is to develop a general framework that addresses such a concern. The developed framework consists of three interrelated major building blocks, namely: Risk Register (RR), Bayesian Network (BN), and Project Time Networks (PTN) for dynamic project monitoring. RR is used to list and to categorize identified project risks. PTN is utilized for modeling the relationship between project activities. BN is used to reflect the interdependence among risk factors and to bridge RR and PTN. A residential development project is chosen as a working example and the result shows that the proposed framework has been successfully applied. The specific model of the development project is also successfully developed and is used to monitor the project progress. It is shown in this study that the proposed BN-based model provides superior performance in terms of forecast accuracy compared to the extant models.


Introduction
Project management is widely applied across domains, such as: construction, product development, research and development, and business process reengineering.Due to the increasing complexity, project managers are experiencing challenges to complete projects within triple constraints (i.e. on time, within budget, within scope).The Chaos Report by Standish Group [1] outlines that 31% of projects were cancelled before the project was completed and 53% of projects had cost overruns up to 189% of the original budget.It was also found that every one hundred projects initiated; 94 of them experienced reworks.The Report also found that about one-third of the surveyed projects had experienced delays for up to 300% of the baseline.Only around 16% of projects could be successfully completed on schedule.A similar pattern is reported for the subsequent studies.
One of various factors attributed to the possible improvement of project success is a proper implementation of project monitoring.The Project Management Institute (PMI [2]) asserts that project monitoring is one of key activities to be carefully managed by project managers to ensure project success.
Project monitoring is conducted regularly by the project team once the project starts.The objective of such an activity is to gather latest project information 1 Industrial Engineering Program, Mechanical and Industrial Engineering Department, Universitas Gadjah Mada, Jl.Grafika no. 2. Jogjakarta 55281, Indonesia.Email: boed@gadjahmada.edu* Corresponding author and to compare with the project plan baseline.Any major discrepancies between the actual data from the field and the baseline need to be responded by the project team.From the project time management perspective, during project monitoring activity, project team would need to make up-to-date predictions on the project total duration as well as the project progress for the remaining project period.
Earned Value Analysis (EVA) is a popular tool to monitor project progress (PMI [3], Rozenes et al. [4]).The tool integrates critical factors of project management in its analysis, namely: project "time" and "cost".By using EVA, project analysts are able to simultaneously assess whether the observed project is ahead/behind schedule and under/over budget.EVA would also provide the forecast of the schedule and expenditure given the current and past performance.
EVA has some advantages over the similar models.The major advantage is the possibility of integrating two project performances (i.e.time and cost) within a single project evaluation framework.Accordingly, EVA provides a more accurate reflection of the projects compared to the other monitoring models which focus on a single performance.Despites the advantages, it is found that the original EVA method has some drawbacks.Firstly, when an observed project is accomplished, the schedule variance (SV) of EVA always results 1.00 even if the project is completed behind schedule.In addition, EVA does not facilitate a deeper analysis on factors that contribute to the project performance.The interdependency among the factors affecting project delay cannot be specified.Because of the limitations, many studies had been conducted to extend the original model and to improve its performance.
A study by Anbari [5], for instance, proposed the planned value method.Using the planned value (PV) method, Anbari [5] calculated planned value rate (PVR) that is equal to the average planned value per period.This method assumes that Time Variance is computed by dividing SV by PVR.Time variance, in this case, refers to the difference between planned duration and actual progress in time units (e.g.day, week).On the other hand, schedule variance (SV) is the difference between planned and actual in monetary values.Another study by Jacob [6] introduced the earned duration method for predicting project duration.Earned duration (ED) is obtained by multiplying the actual duration (AD) with SPI (schedule performance index).A variation of such a method is presented in equation (1).
( ) is the estimated duration at completion at a particular observing time = , planned duration ( ) is the initial planned duration, while the performance factor ( ) is an indicator of past performance which then be used to predict future performance.Hence in this case, the model estimates project duration at by assuming that past performance will be replicated in the future as dictated by .
Another model was proposed by Lipke [7,8] by introducing the concept of earned schedule (ES).ES is derived by comparing cumulative budgeted cost of work performed or BCWP and the duration baseline or budgeted cost of work scheduled (BCWS) for a particular time of observation = .Graphically, ES could be determined by drawing a horizontal line connecting the BCWP at time = 3 and the corresponding baseline of BCWS.The length of the line indicates the in the time unit.The appropriate schedule performance metrics are the proposed by equations ( 2) and (3).
For equation (2), indicates the actual time of project which has been spent when the observation is carried out at .Contrary to the original SV that is expressed in monetary terms, ( ) is computed by using a time unit so it will be intuitively easier to understand.The value of ( ) is the difference between the earned schedule and actual duration.Accordingly, a positive SV indicates a good project performance (i.e.ahead of schedule).
( ) indicates a similar performance by using the ratio.Accordingly, an SPI which is less than one indicates poor project performance in terms of schedule (i.e.schedule slip).Vandevoorde and Vanhoucke [9] attempted to evaluate the study conducted by Anbari [5], Jacob [6], and Lipke [7,8].Vandervoorde and Vanhoucke [9] compared those methods (planned value, earned duration, and earned schedule methods) by using six different project contexts, namely: "Re-vamp checkin" by the indicator of late finish, under budget condition, "Link Lines" project with finish late, over budget condition and the "Transfer Platform" project with early finish, over-budget condition.The results indicated that ES always predict the duration of the project better than the other methods.Some quantitative models for project monitoring further extends the concept by including risk factors within the analysis.One of the promising methods is Bayesian approach.Bayesian analysis is a statistical method which predicts the probability of a particular state or event by using information or data of other (probabilistically) related state(s) or event(s) (Weisstein [10], Clemen and Reilly [11]).For instance, when it is believed that the probability of success of a particular project activity is (probabilistically) affected by the state of weather, new information or evidence for the state of weather could be used to infer more accurately the likelihood of project success.
A Bayesian model representing two related events or states is expressed in Equation ( 4).
( ) where and are the two related events; ( ) and ( ) are the independent probabilities of and respectively; ( ) is a conditional probability of observing event given that event is true.In general risk analysis, is usually seen as "evidence", while is the event to be predicted.As new evidence unfolded in the form of information of , the Bayesian analysis facilitates the updating process of the probability of event .( ) is termed prior probability or probability of before the introduction of information pertaining As the name suggests, Bayesian Belief network (BBN) is a pictorial presentation of the corresponding Bayesian probability model (Clemen and Reilly [11]).BBN represents the causal probability of variables by using nodes (representing variables or events or states) and arrows (depicting the causality chains).According to McCabe and Ford [12], BBN has many advantages for analytical purpose.They include: (a) easy data fusion -the possibility to combine various data sources, including: hard, historical data and subjective expert judgment; (b) intuitively appealing -users or analysts with limited background on advance probabilistic could use the model intuitively.Due to its practicality, various BBN-based models had been developed for risk analysis in different settings (e.g. Lee et al. [13], Trucco et al. [14], Lee [15]).
A study by Gardoni et al. [16] analysed the probabilistic framework in project progress.The purpose of this study was to predict the project progress and its completion time.Moreover, in Bayesian approach, the new information can be involved and updated within the analysis.The proposed model is also able to include some risk factors.Although the model has some academic merits, it is not applicable for practitioners and also has lack of generality.
Arizaga [17] developed a BN-based framework for monitoring project duration and cost.The model includes a risk register for listing all potential risks, Bayesian networks for modeling interrelated risk factors, and project network for reflecting relationships among project activities.The contribution of this study is the utilization of risk registers to develop Bayesian Network (BN) model.Risk register is used to record the identified risks in a project while the BN model reflects the interrelationships among the risk factors.
Despite the comprehensive feature of the framework developed by Arizaga [17], some rooms for improvement are noteworthy.Firstly, the interlinking procedure between the risk register and the Bayesian Network model in the Arizaga"s model is less than clear.Secondly, the proposed Arizaga"s model has not been verified by using data from a finished project (i.e. it lacks of post-project evaluation).A comparison analysis with existing models for accuracy of prediction is also necessary.Hence, this study is conducted to answer the research opportunities.
The objective of this study is (a) to develop a general framework for project progress monitoring (b) to develop Bayesian-based model on the basis of a real project case to provide some evidence to the utility of the proposed concept, and (c) to evaluate the performance of the proposed framework and model.This report provides the extended version of a previous conference paper by Ayuningtyas and Hartono [18].

General Procedure
In general the study has two major stages.The first stage is the development of general framework that includes Bayesian analysis within the projectmonitoring model.The general framework is expected to be applicable for various types of projects.Once the framework is successfully developed, it should be tested in the second stage.In the second stage, a specific BN-based model is developed for a real project case.The performance of the model is then evaluated by comparing the accuracy of the new model and the existing models.
A real development project in Indonesia is selected to provide some evidence to the utility of the proposed concept.Since the selected project had been concluded at the time the research was started, all pertinent project data is available for the research purpose.Primary project data including project risk, risk probability, and problem solving is collected by means of expert interviews.Secondary data including time and budget planning and weekly project reports is treated as modeling inputs to simulate "unfolded" information in a project management progress report / review meeting.To mimic the real dynamic project conditions, only relevant data will be utilized in the analysis.For instance, if researchers intend to analyze the project in the 8 th week, all project information available from the 1 st to 8 th week will be utilized.Hence project data of the 9 th week onwards are not utilized for analysis, because in the real project such data will not yet be available.

Framework
The work by Arizaga [17] is utilized as a key reference for this study.In comparison to the work by Arizaga [17], this framework provides refinement on the interface between the risk register and BN model.Figure 1 depicts the proposed general framework for project monitoring which integrates Risk Register (RR), Bayesian Network (BN) and Project Time Network (PTN).The figure shows the integration of the three major blocks and provides general ideas on how to transform data from RR into BN and into PTN.
Risk Register is a table with eight columns, namely: category, risk, cause, specific cause, probability, impact, impact for completion time, and response.According to the Risk Register depicted in Figure 1, "Category" provides classification of distinct project stages, such as: "bidding", "procurement", and "implementation".All projects risks at any category are identified and recorded in the "Risk" column.The cause of the risk is described in detail in "Specific Cause" column which is divided into two types: "internal" and "external"."Internal" reflects uncertainties which are attributed to the project team; while "External" refers to those beyond the authority of the project team such as suppliers or natural.Meanwhile, the "Probability" column reflects the state and probability of occurrence for the related risk.For example, "bidding during rainy season" as a risk has two states of probability -i.e."Yes" or "No".Each state has a value of probability between 0 and 1.The column "Impact" indicates two different types of risk consequence towards project activities."Local" refers to a risk which affects only a single project activity while "Global" is for a risk which affects the whole project activities.By identifying the risks which potentially occur, the column "Response" describes the possible action to manage the risk.The potential impact towards the project duration is recorded in the "Magnitude of Impact (time)" column which can be expressed by a constant number or a probability distribution.
In the RR block, as depicted in Figure 1, R1 is an identified, in-between risk within the project; while RF1 is the identified "root cause" (i.e. a risk factor) which affects R1 and other Rs.P1 in the "Prob" column is attributed to R1 that represents its probability of occurrence.The value of P1 is generated when RF1 occurs.Meanwhile, the "Impact (time)" column reflects the severity of the particular risk towards project duration.
Figure 1 also provides a generic illustration of three BN clusters, namely: BBN1, BBN2, and BBN3; two independent risk factors namely RF1 and RF3; and a dependent risk factor when its event is influenced by other events namely RF2.RF1 as a primary case, which has certain probability to occur, may result in cascading effect to all subsequent events in BBN1 and may determine the occurrence of R1.It should be noted that due to the unique nature of respective projects (PMI [2]), past historical data is very limited hence in most occasions the probability should be determined by expert judgment.BBN1 afterwards influences the probability value of RF2.BBN2 will evaluate the potential impact caused by RF1 and RF2 called Impact1.Impact1 directly affects the activities of A, C, E, and F. As generated by RF3; BBN3 also generates a new impact called Impact2 which influence the activities of E, F, and G simultaneously.Thus; in this example, the activities of E and F are affected by Impact1 and Impact2.These consequences are then translated as a productivity ratio for each activity which is expressed by a value between 0 and 1.The productivity ratio as an output of the BBN block analysis is computed and the result depends on the model inputs and the BBN structure.The ratio would then be used by the Project Time Network (PTN) block to adjust the estimated duration of the pertaining project activity.

Utilization: The Iterative Process of Project Monitoring
The first step to utilize the framework is to develop the project time schedule (project time network, PTN) model.The spreadsheet-based model identifies all project activities and logical sequences (interdependences, successors and predecessors) among the activities.The second step is to build the risk register containing all identified risks and the risk assessment.The third step is transforming the Risk Register into a Bayesian Network (BN) model.An example of such transformation is illustrated in the "Case Example" section.The BN model would reflect the risk factor interrelationship.The BN model is then interlinked with the Project Network model.The risks and uncertainties represented by BN would affect "productivity" directly which in turns affect the variation of activity durations.
Figure 2 illustrates the iterative, dynamic process of project monitoring using the proposed BN-based framework.Once a project is kicked-off, all pertinent information available at the time (the initial data) is utilized for model inputs.After a specific time period, when a project monitoring is carried out, new ("unfolded") information would have become available.The new information is utilized as model inputs, and the simulation is re-run to get the updated model outputs.The same procedure is iterated for the subsequent project monitoring activities.

Implementation
To test the applicability of the framework, a real project case is analyzed by using the framework.The utility of the framework is demonstrated by using the case.The selected project was "the House Relocation Project for Landslide Disaster Victims".

Figure 2. Iterative process of project monitoring
The project was located in Central Java, Indonesia.The planned duration of the project was 26 weeks from February to July; while the initial, committed budget was Rp 2.5 Billion.Since the project has been completed at the time the research was carried out, actual project completion data (e.g.actual project duration) is available in addition to the data from project planning.
The first step in the model development was interviewing the project members to identify project risks.Interviews were conducted with six project members to get accurate information.Biases due to subjectivity in the risk assessment can be reduced by acquiring data from multiple project members.Data in the form of risk factors is stored in the Risk Register.
Table 1 partially shows the interview result in the form of a project risk register.The first category of the identified risk is "pre-project" which covers all the identifiable risks prior to project commencement.For instance, "bidding during rainy season" is one of the identified risks for the particular stage.According to the interview, the risk is included into "external" category because it is related to the nature beyond the project management control.The particular risk is also considered to have a generic or global impact towards overall project activities or work packages.The risk was believed to have probability about 20% and the impact towards project duration is in the scale about 0.9.As mentioned earlier, all the numbers (including: probability and impact) were taken from expert judgment due to the very limited past, pertinent historical data.The second part of the risk register is "procurement".All pertinent risks were identified and recorded.The same protocol was carried out for other stages (not shown here).
The second step is the development of project network model (i.e.project schedule) using a spreadsheet.All activities taken from the Work Breakdown Structure (WBS) are identified and arranged in a logical order.Initial estimates for activity durations are provided, and the total project duration is computed by means of the critical path method.Table 2       The linkage for each factor shown in (Figure 3) has been developed according to those defined in Risk Register.For example, value node "Delivery Lateness" depends on the value of the three risk factors, namely "Vehicle Lack", "Unavailable Material", and "Accidental Demand".This condition means that the "Delivery Lateness" is influenced by a combination of those three risks.The linkage is consistent with the risk register which has "Delivery Lateness" as a risk, and the third risk factor as specific causes.
The tree diagram depicted in Figure 4 illustrates the three possible occurrences for "Subcont.worklateness" which are: seldom, sometimes, and often.Each state has probability values of 0.1; 0.7; and 0.2 respectively.The value shows the subjective probability of a risk which may appear in the project.Moreover, every eventuality also has a value pertinent to consequence of the risk.For example, if a project is delayed because of its subcontractors, project productivity will be reduced to 0.7.
The duration estimates in the work package-level is affected by the productivity ratio for the particular work.As mentioned earlier, the productivity ratio is the output of the BBN model (taken from the value node) which represents risks and their interdependency.In other words, as opposed to the fixed work-package-level duration estimates in the traditional project network models, the BBN-based model would provide dynamic updates on "duration estimates" by considering "productivity" which in turn is affected by interlinked "risks".
During project monitoring, the values of chance nodes in the BN model could be updated as new information becomes available.The input updates of the BN model would lead to re-calculation of the BN model output.This, in turns, would trigger the recomputation of the project network.Hence new estimates of project total duration and project progression (the S-curve) are available for new information updates during project monitoring process.
To assess the efficacy of the developed quantitative model, the accuracy of predictions (total duration and S-curve) is compared with this of extant models.
The following passages provide the elaboration.

Predicting Project Total Duration
Project total duration is estimated by using the proposed model in the case example.To reflect the real procedure of project monitoring, the retrospective procedure is applied.Thus, when a prediction is carried out at the n th week, the analysis will only use project data from the nth week and earlier.For the assessment, the prediction of project total duration is carried out for every week from the first to the final week.
In this case, due to the limited data, only one type of data is updated for the BN model throughout the whole duration of the project monitoring.The data are the frequency of the rain.By using the data, the probability value of risk factor "rain" can be updated.
This study also uses six extant models which are developed on the basis of EVA (i.e.EVA-based models) as comparisons.The models respectively refer to three methods as follows.
(1) Planned value (PV) that utilizes the planned value rate for the calculation.Planned value rate is the average planned value per time period.(2) Earned duration (ED) which is the product of the actual duration and the schedule performance index (SPI).(3) Earned schedule (ES) where the earned value at a certain point in time is traced forward or backward to the performance baseline.The three respective methods are transformed into two models each on the basis of the performance factor (PF) value.The first utilizes PF=1 assuming that the remaining activities would go as planned.The second model utilizes PF=SPI assuming that the remaining activities would be carried out at the same performance level as the previous activities.Table 3 shows the EVA-based models and the respective parameters to be used in this study.
Figure 4 shows results of the predicted project total durations at different points of times of monitoring.At the initial stages (at 1 st , 2 nd , and 3 rd weeks).EVAbased models such as ED1 and ED2 yield zero or even negative value for estimated project duration.This result shows that EVA-based models have limitation on project duration prediction especially when earned value is greater than planned value.At the 4 th week, the proposed model -by using available data only from the 1 st and 2 nd week-predicts that the project will be completed in (i.e. the predicted project duration is) 24 weeks.At the same week, the EVAbased model using PV (i.e. the 1 st model) predicts a 21 of project duration.The actual project duration of 26 weeks is also plotted in the graph.
As can be seen, all EVA-based models have a tendency to be overly optimistic when estimations are carried out on the 1 st to 7 th periods, respectively.Estimations become pessimistic from the 9 th to 26 th periods.The 8 th period becomes the turning point from optimistic to pessimistic estimation.This is because before 8 th period, earned value is always greater that the planned value.However; after 8 th period earned value become smaller than the planned value.Labor productivity that is initially good; decrease until the last period.Estimations of the BN model follow the same pattern but it has smaller deviations from the actual duration.To assess the accuracy of the models, the mean absolute percentage error (MAPE) is utilized.The error is calculated by comparing the predicted project duration for respective model and the actual project duration of 26 weeks.Result shows that the BN model has a superior performance in terms of estimation accuracy over the EVA-based models.The MAPE for BN model is 5.47% while the best EVA-based model yields a MAPE of 7.26%.The result of each model could be shown in Table 4.

Predicting Project Progression (S-Curve)
In this section, the performance of BN and EVAbased models to predict project progression (the Scurve) at a certain point of time is reported.The 8 th period is chosen as the "now" point and the project progress for the period 9 th onwards is predicted.For this prediction, the proposed model utilizes all data from the 1 st week to the 8 th .For the EVA-based model, the data of SPI (schedule performance index) of the 8 th period is used for computation on the 9 th onwards.It becomes a limitation for EVA-based model because the model always assumes that current productivity is the same as this in the previous stages.BN model, on the other hand, predicts productivity for each period by updating all information recorded in the previous time and considering potential risks.

Testing a Possible Bias in Prediction
A t-test is utilized to identify the possible existence of a systematic bias in BN-based predictions of total project duration.Prediction errors are gauged against zero value.The H0 is that the mean error prediction, which is calculated by subtracting actual project duration with estimated project duration, is equal to zero.A complete parameter of H0 and H1 is explained in Table 5.The t-test suggests that the prediction by the BN model has a systematic error (mean=1.038;SD=1.843; p=0.004) while EVA models have random errors as shown in Table 6.A closer examination suggests that predictions by the BN model are in general overly optimistic -i.e. the predicted duration is less than the actual duration.The existence of systematic biases opens the opportunity to improve the model accuracy by providing output adjustment to compensate the systematic errors.A follow-up study is required to achieve the objective.
From both MAPE and t-test evaluations, an interesting result is observable.MAPE is a metric to indicate the accuracy of a forecast.It is done by computing the difference between the forecasted and the actual values.Hence the smaller the value; the more accurate the prediction is.It has been demonstrated in this study that the proposed BNbased model has the smallest MAPE if compared to other six models.Hence in terms of accuracy the proposed model is superior.
The t-test, on the other hand, evaluates whether a systematic error is observable in the prediction.In forecasting, a systematic error indicates an opportunity to improve the performance of estimate (Mak and Raftery [19], Cleaves [20], Hartono et al. [21], Flyvbjerg et al. [22]).The improvement could be done by identifying the source of systematical errors and eliminate it.This study has yet to identify the source.From the result of the two evaluations, it could be concurred that utilization of the extant proposed BN-based model evidently lead to a superior performance in terms of accuracy; and better still, there exists opportunity to improve the performance.

Conclusion
A new framework to monitor the project progress has been successfully developed by integrating the Risk Register, Bayesian Belief Network, and Project Network.The framework has been successfully applied in a residential project as study case.The result shows that the BN model provides superior performance compared to extant models in terms of prediction accuracy of the project total duration as well as project progress.A systematic bias in prediction is identified in the proposed model.Hence there is an opportunity for a follow-up study to improve the model by identifying the source of bias and eliminate it.Another possible future study is to apply the framework in much more complex projects.The framework could also be extended to incorporate two other project"s triple constraints -i.e.project cost and scope.

Figure 1 .
Figure 1.General framework shows the Gantt chart in the spreadsheet form to model the project network.The spreadsheet-based model facilitates the computation of project duration by considering the duration estimates of each work packages and the logical dependencies among work packages.The two previous models are then integrated by means of a BN model.The BN model defines the interrelationship among the risk factors previously identified in the Risk Register.

Figure 4
depicts the BN model of the particular project case.The model is divided into two major parts, namely: Global Productivity and Local Productivity.The classification follows the Risk Register form.As earlier mentioned, risk factors which are identified by experts as having impacts towards the overall project are classified as Global Risk.Those risks would affect the project Global Productivity.On the contrary, risk factors which are considered to have partial effect towards few, specific project activities are related to Local Productivity.

Figure 3 .
Figure 3. Bayesian network of the project For this observed project, Local Productivity is built upon thirteen chance nodes and a corresponding value node as shown in Figure 3.A chance node is a node that needs input data, including the value and probability; whereas a value node provides the output of the BN-based computation."The Global Productivity" -which will affect all project activitiesconsists of two chance nodes.The outputs of the BN model (i.e. the value nodes) are used as data inputs for project activity duration in the project network model.For this study, DecisionTools® suite by Palisade Inc. is utilized.

Figure 4 .
Figure 4. Predicted project duration for respective model Project progression uses earned value as indicator.Thus, previous EVA-based models such as Earned Schedule (ES), Earned Duration (ED), and Planned Value (PV) yield the same result.Therefore, this stage only compares the performance of BN and one representative EVA-based model.

Figure 5
Figure 5 depicts the predicted project progress for the various models.The graph shows the differences in outcomes of BN and EVA-based models.Predictions for early periods are similar for both the BN and EVA-based models.Starting from the 15 th week, the predictions diverge.MAPE is used to calculate the accuracy of each model.The BN model still shows the best accuracy with MAPE value 12.2%, while the EVA model gives a MAPE of 16%.

Figure 5 .
Figure 5. Estimates of project progress at the 8th period

Table 2 .
Project Gantt chart in spreadsheet

Table 3 .
Extant EVA-based models for comparison analysis

Table 5 .
The t-test