A Statistical Approach to Discovering Process Regime Shifts and Their Determinants

Siddiqui, Atiq W.; Raza, Syed Arshad

doi:10.3390/a15040127

Open AccessArticle

A Statistical Approach to Discovering Process Regime Shifts and Their Determinants

by

Atiq W. Siddiqui

^*

and

Syed Arshad Raza

College of Business Administration, Imam Abdulrahman Bin Faisal University, Dammam 31451, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Algorithms 2022, 15(4), 127; https://doi.org/10.3390/a15040127

Submission received: 24 March 2022 / Revised: 10 April 2022 / Accepted: 11 April 2022 / Published: 13 April 2022

(This article belongs to the Special Issue Process Mining and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Systematic behavioral regime shifts inevitably emerge in real-world processes in response to various determinants, thus resulting in temporally dynamic responses. These determinants can be technical, such as process handling, design, or policy elements; or environmental, socio-economic or socio-technical in nature. This work proposes a novel two-stage methodology in which the first stage involves statistically identifying and dating all regime shifts in the time series process event logs. The second stage entails identifying contender determinants, which are statistically and temporally evaluated for their role in forming new behavioral regimes. The methodology is general, allowing varying process evaluation bases while putting minimal restrictions on process output data distribution. We demonstrated the efficacy of our approach via three cases of technical, socio-economic and socio-technical nature. The results show the presence of regime shifts in the output logs of these cases. Various determinants were identified and analyzed for their role in their formation. We found that some of the determinants indeed caused specific regime shifts, whereas others had no impact on their formation.

Keywords:

process mining; regime shifts; time series; temporal behavioral change; socio-technical; socio-economic

1. Introduction

Real-world processes involve internal and external determinants, which act together to produce a certain expected output. These determinants, independently or in response to other factors, change over time, causing systematic and long-term shifts in their actual output patterns. Discovering these shifts and their determinants is thus a vital issue in understanding and managing the behavior of such processes. We note here that the term process is used in a broader sense, not restricted to just business or technological systems [1,2,3] that conform to well-defined aims such as producing specific quantities or qualities [4]. We also consider processes that have complex social, physical, economic or other determinants and interactions within or across their environments [5]. For instance, commodity price or demand generation is a process that has socio-economic determinants such as collective human behavior and geopolitical, economic or even health-related events [6,7,8]. Although there is no well-defined central control in such processes, stakeholders still seek to discover behavioral shifts for obvious reasons. It may also refer to socio-technical processes such as in information systems development and/or adoption [9,10] in timetabling, transportation and healthcare [11,12] and in road traffic systems, in which the output may be in the form of accidents or injuries [13]. In such systems, the concerned stakeholders are interested in discovering process dynamics and their conformity to intended behavior while taking appropriate enhancement measures [14].

In this context, we observe that the related process-mining literature has primarily taken a predominantly technological perspective. More precisely, the evaluated processes are well-defined, centrally managed, isolated from their wider environment (e.g., social or economic), and they behave mechanistically. The literature is rich in this direction, covering process discovery, conformance and enhancement [15], where significant progress has been made in algorithm development, improvement and scalability [16,17,18]. A gap thus exists, as process mining of socio-economic or socio-technical systems has received minimal attention. Understanding overall behavioral dynamics is vital for such systems, in which behavioral regime shift discovery and analysis are crucial unaddressed questions. This has been recently highlighted by Zerbino, Stefanini and Aloini [15] in the business context, pinpointing a lack of coverage of the social side of business problems.

In this work, we aim to address this gap, and we propose a methodology that applies to studying behavioral regime shifts or changes in processes of such varied nature. Specifically, we aim to deal with these dynamic and temporally shifting processes at two levels. The first level involves finding all the behavioral regime shifts [19]. To illustrate, we use the time series event log of two different processes (Figure 1), in which regime changes are evident. In the first example (Figure 1a), we see three distinct regimes: A, B and C, in which the process output level significantly and systematically increased between regimes A and B. Later, between regimes B and C, the level significantly decreased. Similarly, regime A shows a stable behavior in the direction or trend example (Figure 1b). In contrast, regime B shows a systematic rising trend. We also note that we mainly focus on significant, systematic and potentially explainable shifts and not on small random fluctuations. As all regime shifts are discovered in the first level, the second level focuses on statistically finding their determinants. Although details are presented later, in summary, this phase involves identifying potential determinants, which are then statistically evaluated for their role in the formation of these regime shifts. Clearly, a lack of such knowledge only leads to inaccurate process understanding, predictability and inevitably its management and control.

Formally, we thus translate our aim in this work to the following three linked research questions:

Are there any regime shifts in the process output?
What are the potential determinants that may have affected the regime change?
What role do these potential determinants play in forming these regimes?

We propose a novel algorithm that addresses the above questions in a stepwise fashion. Before discussing it in detail, we first refer to a common process modeling framework, which allows translating different process views to a commonly agreed model, which is then used in regime shift analysis. This framework, which is theoretically grounded, well-defined and applicable to most process scenarios, is borrowed from the recent process modeling literature [20], and it is briefly discussed in Section 3.1.

Once a process is mapped to a model, the algorithm’s first step defines its modeling basis, referring to the aspect of the process output being analyzed. For example, in a particular manufacturing process, the production line analyst may be interested in the output levels. In contrast, the quality control analyst may look to find any systematic trend (positive or negative) in a quality characteristic. These two distinct aims can be captured via respective level and trend regression modeling bases. Once a statistical modeling basis is agreed upon, it is then used with the output event log time series data to test for any regime changes. Since, at this stage, we are concerned only with discovering all regime shifts, this model basis is kept to its minimal form, i.e., the level model has the intercept term only, and the trend model has an intercept and a proxy time-index term capturing any trend element in the data. Since there are no explicit variables for any determinants in this form, it allows unearthing maximum regime shifts.

Once all regime shifts are discovered, the methodology addresses the second question: identifying potential process determinants. This question is mainly dealt with at the modeling stage mentioned earlier, in which a process model is built to identify any related elements. As these determinants are found, relevant time series data from event logs are further gathered. A key aspect to note here is that a track of changes in these determinants on the timeline is needed, which can then be used in the next phase to statistically link any change in the process output to a change in a particular determinant.

The rest of the methodology deals with statistically relating changes in determinants to regime shifts in process output data (the third question). The idea is to sequentially introduce determinants as independent variables to the base model, one at a time. The introduction sequence is based on the first appearance of change on the timeline in these determinants. Each time a determinant variable is introduced, the revised model is statistically fitted and tested again for regime shifts. If the statistical tests now fail to recognize a regime shift discovered in the first stage, we deduce that this determinant played a causal role in forming this shift. This step is repeated until all potential determinants are introduced and tested. Finally, the fitted models can then be analyzed for their significance or impact on the process output. It is also easy to see that this approach can be used in conformance analysis when a determinant is intentionally adjusted for a potential enhancement.

To demonstrate the merits of our proposed approach, we use three case studies from processes in manufacturing, commodity price generation and road traffic safety analysis. These cases vary significantly in terms of their technical, socio-economic and socio-technical nature and their levels of complexity. The results indicate regime shifts in all cases, with varying relationship structures with their respective determinants. The first case is used as a validation or test case, in which actual relationships and regime structure are known in advance, whereas the other two cases are used to demonstrate the use of the methodology in distinct and complex scenarios.

In the rest of the paper, we present a discussion in Section 2 on the recent process mining literature and how our work is positioned within the extant literature. Section 3 then presents the proposed methodology, where we first present the process modeling framework, followed by the methodology and its details. The three case studies and their analyses are presented in Section 4, and conclusions of our study are presented in Section 5.

2. Literature Review

Process mining initially emerged as a workflow mining technique to extract a process model from software engineering data [21,22]; other aspects such as process cost, risk reduction, maximizing productivity, resource utilization and improving quality have progressively gained attention [23]. Although this makes the process mining landscape quite extensive with multiple recent literature review papers available providing a comprehensive account [15,23,24], we mainly discuss those works that help us position our contribution within the extant literature. Moreover, we focus on identifying the application context, the methodology and/or tools used and the aim in terms of process discovery, conformance or enhancement within those works.

Bernardi et al. [25] used smart meter readings to discover anomalous customer behavior in energy usage over time using Hamming distance and cosine similarity. In another paper, Bernardi et al. [26] applied process mining on call traces to characterize smartphone applications for malware detection. Myers et al. [27] investigated cyberattacks in industrial control systems by comparing five algorithms to create an accurate yet simple process model and recommended Inductive Miner as the most suitable algorithm. Sahlabadi et al. [28] introduced genetic process mining to detect deviant user behavior in social media websites. They applied their study on Facebook users by first generating a process model for normal user behavior and then by identifying the abnormal behavior by conformance checking. For web application security, Compagna et al. [29] proposed Aegis—a tool to process mine a target web application to discover its workflow model and consequently enhance its security policies. In the same context, Bernardi et al. [30], using model-driven engineering and process mining, applied Unified Modeling Language to generate the formal model and ProM visualization techniques for deviation identification [31].

In the software development arena, Leppäkoski and Hämäläinen [32] used process mining as an aid for agile (human-centric) development. In contrast, Gupta et al. [33] used it for software maintenance or enhancement. In pre-production software quality assurance, a common process mining-based research trend is either error or bug minimization or their detection. To enhance software reliability, the recent works of Rubin et al. [34] and Xu et al. [35] focused on identifying errors or bugs, and Lübke [36] focused on generating test cases for conformance checking.

In healthcare, one of the early papers by Ciccarese et al. [37] used process discovery for resource analysis and conformance to clinical guidelines. Later, for role description and collaboration among hospital emergency department professionals, Günther and Van Der Aalst [38] and Li et al. [39] proved the usefulness of Fuzzy Miner and MinAdept algorithms, respectively, to deal with inherent unstructured characteristics of healthcare process models. In emergency medicine, process mining has recently been applied for the initial evaluation and diagnosis of accidental injuries, unexpected diseases and coordination among care providers for surgical and non-surgical treatments [40,41,42].

Trcka et al. [43] demonstrated process discovery and conformance checking in education to identify student behavior using the ProM tool. In the same domain, Okoye et al. [44] discovered behavioral patterns and rules for personalized learning with Web Ontology Language and Semantic Web Rule Language. For improving student learning in a university environment, Groba et al. [45] used the SoftLearn tool to analyze student flows based on social networks with a graphical interface.

In the business domain, process mining aims to fill the gap between data mining and business process management [46]. As per Porter’s Value Chain, the literature coverage in this context is divided into primary processes, including logistics, service, marketing and sales, and operations, and into secondary processes, such as infrastructure, procurement, research and development (R&D), and human resource management (HRM) [15]. In business services, Cho et al. [47] redesigned the customer reservation process of the largest travel agencies in Korea. Moreover, Syamsiyah et al. [48] compared form handling process variants within Xerox Services for actionable insights on the most common process behaviors. Similarly, marketing and sales have previously been analyzed using data mining techniques such as clustering, profiling and predictive modeling [49]. The latest research in this area was reported by Măruşter and van Beest [50], who redesigned the booking process of a utility company considering process and time perspectives.

In operations, Roldán et al. [51] exploited process, time and organizational perspectives in an emergency context to detect bottlenecks and inefficiencies in multi-robot missions. Ruschel et al. [52] integrated process mining and Bayesian networks for predicting maintenance intervals for manufacturing equipment. In logistics, Paszkiewicz [53] investigated a warehouse management system of a manufacturing company with conformance checking, and Sutrisnowati et al. [54] assessed lateness probability in container handling. In another work, Repta et al. [55] analyzed event data of a warehouse to reconstruct a process model using Global Positioning System devices and Radio Frequency Identification readers. Finally, in the procurement process, Jans et al. [56], Outmazgin and Soffer [57] and Reijers et al. [58] performed the detection of internal control violations or workarounds in the procurement process, and Fleig et al. [59] streamlined and standardized the IS-supported procurement process of three manufacturing companies with a process-mining-enabled decision support system.

It is clear from the above discussion that the bulk of the process mining literature has primarily focused on problems that may be conceived as technological systems in the sense that their expected behavior is mechanistic due to their expectedly well-defined structure, and they do not have considerable convoluted social and economic determinants. Therefore, a major gap exists in mining socio-economic or socio-technical systems that are prevalent in the wider technological, social, political and economic spheres. Zerbino, Stefanini and Aloini [15] have recently highlighted the same issue of ignoring the social side in business problems. Consequently, a general approach is needed, which may be used to mine complex process behaviors of such systems. We thus propose a methodology that seeks to capture behavioral changes (regime shifts) in such complex systems, whose knowledge can be employed in process conformance and enhancement.

3. Methodology

The proposed methodology sequentially addresses the three research questions stated in Section 1, i.e., revealing any present regime shifts in the process output data and identifying and statistically linking these shifts to the identified potential determinants.

The methodology comprises several steps, including (1) translating different process views to a commonly agreed model. This translation is performed using a generally applicable Stochastic System and Process Modeling Framework [20], built upon the theoretical foundation of Bunge’s Ontology [60,61]. (2) Using this model, a statistical modeling basis is selected as the process output evaluation criterion in the regime change analysis. We note that this criterion may vary according to the output of the process being evaluated and the analyst’s choice. (3) By using a minimal form of this basis (i.e., the statistical model with no explicit determinant variable included), statistical tests are performed to identify and date all regime shifts present in the data. (4) Process determinant variables are then sequentially introduced into the base model according to their change on the timeline and are statistically tested again for regime shifts. If an earlier regime shift disappears, it shows that the introduced determinant has a causal relationship with the regime shift. This step is repeated until all potential determinants are introduced and tested. (5) Lastly, the fitted model is analyzed for its significance or impact on the process output. A detailed account of all these steps is presented in the following sub-sections.

3.1. Process Modeling and Mapping

The most primitive SSPM concept is a Thing

X =_{d f} 〈 x, p (x) 〉

, which is an individual x that exists independently and has some defining properties p(x) ∈ P (P being the set of all possible properties of x). All things possess a state

s (X) \in S (X)

(where

S (X)

is the set of all possible states) at any time t, which is determined by the current values of its perceived properties called attributes. All these states are determined by some state function following state laws (probabilistic or deterministic). Using these constructs, we define a system Y as a coupled collection of interacting things (y), which demonstrate some basic properties (present in constituent things) or emergent properties (not present in constituent things), i.e., p(y). Other things or systems affecting the system shape its environment.

Such a system is expected to execute some process, which is a sequence of unstable system states leading to some stable state (ideally its goal). This is reflected in a change in system attributes, including the considered output (i.e., the attribute of interest). Here, process validity is a key issue, as the process may or may not lead to a stable state within a reasonable time or even to the intended state. In this context, a valid process has a process path (i.e., change in states leading to a stable state) that ends within its generally defined goal states (e.g., a success) and within a finite-time bound. Moreover, the goal state set must be reachable via at least one valid path. Here, we must mention that the sequence of states is dependent on internal and external triggers and events and on the governing transition laws. We further distinguish between properties and system couplings that are directly affected or changed by these triggers and the properties that, as a result, adjust to the change. Here, we refer to the latter as the dependent variable and the former as the independent variable (only in relation to the latter). In this sense, a good process design is the one that ends in its goal state, which is vice versa for a failed process.

In reality, a system may face varying internal or external environmental triggers, potentially leading to varying resulting processes. In the process mining context, the question is thus to empirically test if the enacted process is valid and has led to its intended goals. Thus, we can generally represent the problem of interest as shown in Figure 2, where we can identify the System of Interest (SoI) producing the processes, and its properties may allow us to identify both key determinants and the property of interest (a dependent process output) to analyze. Moreover, external determinants can be identified based on external systems and things in the environment.

Finally, we note that the triggers can be intentional, unintentional, visible, or invisible. Accordingly, the process mining task here can be of discovery, conformance, or enhancement. In all cases, the above modeling approach holds.

3.2. Algorithm and Statistical Tests

We now present the method to identify regime shifts in the SoI’s process output (i.e., the dependent attribute of interest) and to find its causal relationship with potential determinants. The high-level algorithm is first presented (Figure 3) and discussed, and details of statistical modeling and testing are discussed in the latter part of this section.

As shown in Figure 3, the procedure requires an initialization phase, in which the first step is to model the system and its properties (denoted as Y and p(y), respectively) via the SSPM framework. The modeling approach is already discussed in Section 3.1 above. Based on p(y), a dependent property p^′(y) ∈ p(y) of interest is identified, whose time series event log data are considered for process mining. Furthermore, all determinants d ∈ D are identified, which may be related properties and/or couplings (internal or environmental). Once the model is formed, the evaluation or the mining criterion is set for p^′. This criterion is based on the analyst’s preference and on the attribute of interest. Accordingly, the criterion is translated to a minimally described model B, i.e., with no determinants initially considered. This minimal model allows the discovery of all regime shifts, irrespective of their causes. Thus, in the first post initialization step, model B is fitted and tested for all regime shifts using the time series data for p^′ with a statistical procedure. This procedure involves determining and dating all present regime shifts, the details of which are discussed later in this section. If no regime change is detected, the procedure is terminated; otherwise, the procedure moves to the phase for modeling and testing the impact of determinants. In this phase, a potential (or even exhaustive) subset is extracted from D (i.e., d^′ ⊆ D). This list is then ranked (D^′: the ranked list) based on the earliest chronological change in respective determinants, i.e., a determinant is placed as the first item in the set, which sees the first change in its state on the timeline, etc. Notably, the ranking and testing in this sequential and incremental manner allow simulating the actual timeline of events that may have caused various regime shifts. In this step, when an independent determinant variable, based on its rank, is added to the base model, the updated model (B^′) is then fitted and tested again for regime shifts. During testing, if a regime shift found in the previous phase remains undetected, it is inferred that the determinant has (at least partially) caused the formation of this shift. Finally, the fitted model is used to compare the impact of determinants having a causal relationship with p^′.

Statistical Modeling and Testing

The above procedure requires an appropriate statistical modeling approach and a general testing procedure accommodating varied modeling bases. We discuss both the issues in this section; however, we first refer to the following notations needed in our discussion:

$i :$ Time index used with the time series event log data (i.e., $i = 1, 2, \dots, n$ for n observations)
$p_{i}^{'} :$ Time series event log data of evaluated dependent property at time i
$d_{i}^{} :$ A vector of determinant regressors of size $| D^{'} | \times 1$ , at the time i
$α_{i} :$ A vector of model coefficients for all $d_{i}^{}$
$ε_{i} :$ Random noise at the time i
$m :$ The number of regime shifts in data
$M_{m, n} :$ Set of all regime shift points
$s :$ Regime index. With m regime shift points, we have m + 1 regimes.

In the above notations,

p_{i}^{'}

is the time series event log for the dependent system property and

d_{i}^{}

is the same for the considered determinants. As demonstrated in the initialization phase of the algorithm (Figure 3), we identified the need for an evaluation criterion, which is used to form the base statistical model. Specifically, the base model form is dependent on the nature of the considered properties, determinants and evaluation criteria.

To illustrate, we present two scenarios and their corresponding general model forms. The first scenario involves a property

p_{i} \in ℝ

, which is assumed to be normally distributed. This may be the case of a product’s quality characteristics produced by a manufacturing process. The second one is the case in which

p_{i} \in I

represents some count data, which thus follow a Poisson or a negative binomial distribution. This case is prevalent in, e.g., traffic accident data involving accident counts. For the two cases, we require separate modeling bases, which in generalized terms are represented, respectively, as:

p_{i}^{'} = d_{i}^{T} α_{i} + ε_{i} (i = 1, 2, \dots, n)

(1a)

\log (p_{i}^{'}) = d_{i}^{T} α_{i} + ε_{i} (i = 1, 2, \dots, n)

(1b)

As demonstrated above, modeling bases for other situations can be formed. Although the above generalized and fully defined models show a relationship structure between the independent and all the dependent determinant variables, the initial requirement of the procedure is the minimal model form, which is used to discover all regime changes in

p_{i}^{'}

. Thus, the above generalized but fully defined model form can be rewritten for its minimal form as:

p_{i}^{'} = t_{i}^{} α_{i} + ε_{i} (i = 1, 2, \dots, n)

(2a)

\log (p_{i}^{'}) = t_{i}^{} α_{i} + ε_{i} (i = 1, 2, \dots, n)

(2b)

Here, we removed variables for all considered determinants in Equation (2a) and Equation (2b) and replaced them with a simple time index term t_i. This term in the minimal model captures any trend in the data. Later, respective determinant regressor variables are added when evaluating their roles.

Statistical testing: For any of the above model structures, the regime shifts’ problem is simply to hypothesize any change in the coefficients

α_{i}

of the determinants. In other words, if the coefficients remain stable throughout the timeline, there is no regime change. On the other hand, any change in even one of the coefficients means a regime change. Accordingly, we define the null and the alternative hypotheses as follows:

\begin{array}{l} H_{0} : α_{i} = α_{0} (i = 1, \dots, n) \\ H_{1} : at least one α_{i} \neq α_{0} (i = 1, \dots, n) \end{array}

(3)

For the general case of m regime shifts, we have m combinations of stable coefficients, or, simply, there are m + 1 regimes. In this case, the minimal models presented in Equation (2a) and Equation (2b) can be rewritten as:

p_{i}^{'} = t_{i}^{} α_{i} + ε_{i} (i = i_{j - 1} + 1, \dots, i_{j}, j = 1, \dots, m)

(4a)

\log (p_{i}^{'}) = t_{i}^{} α_{i} + ε_{i} (i = i_{j - 1} + 1, \dots, i_{j}, j = 1, \dots, m)

(4b)

We also note that the number and locations of regime shifts are generally unknown for this scenario. Thus, all shifts need to be empirically determined. Moreover, as we are not limiting ourselves to a particular model form, a generally applicable statistical testing approach is needed that is valid for the respective assumptions applied to these models. Accordingly, we resort to the statistical procedure that tests for the unknown number of regime changes and finds their number and locations. These shift dates are then used to fit the model for stable regime segments. The key to this procedure is two statistical tests developed by Bai and Perron [62,63] and Zeileis et al. [64]. The tests are quite robust in terms of assumptions on the nature and distribution of both regressors and noise. To further explain the nature of tests, we use tests developed by Bai and Perron [62] that applicable to a single regime change. The first is an F-test [65], in which there is only one shift assumed, though of unknown timing. The F-statistic used is:

F_{i} = \frac{{\hat{ε}}^{T} \hat{ε} - \hat{ε} {(i)}^{T} \hat{ε} (i)}{\frac{\hat{ε} {(i)}^{T} \hat{ε} (i)}{n - 2 k}}

(5)

The test assumes an alternative hypothesized regime shift at the considered time i, which uses a sequence of F_i statistics for any change at i, i.e.,

i = n_{h}, \dots, n - n_{h} (n_{h} \geq k)

. The residuals above

\hat{ε} (i)

are then compared for the competing models, i.e., with and without regime segments. A certain threshold is used for rejecting the null hypothesis when the supremum of these statistics is above this threshold. We note here that the index h in the term

n_{h}

represents a parameter for a reasonably minimum time segment length choice, and thus,

n_{h}

is estimated as

⌊ n h ⌋

. In other words, based on the system and process scenario, a minimum period length is defined for a shift to be considered. For the generalized case of m regimes, we use a test variant developed by Bai and Perron in [62] and [63]. In this case, the same evaluation is performed for m vs. m + 1 regime shifts. We also make use of a second generalized fluctuation testing framework. In this test, the models are fitted to the data for which residual fluctuations using a process governed by the fluctuational central limit theorem is determined [64]. Any increase in the fluctuations or the process trajectory suggests a deviation from the null hypothesis. We refer the readers to [62,63,64] for full details of both the procedures.

As these tests confirm the presence of a regime shift, the points where shifts have occurred are to be dated, and the confidence interval is to be determined. The dates are needed to specify the segments’ timeline ranges and to fit the corresponding models. Assuming an arbitrary model fit, i.e., in terms of the dates, its sum of squared residuals (R) can easily be determined as

R (i_{1}, \dots, i_{m}) = \sum_{j = 1}^{m + 1} r (i_{j - 1} + 1, i_{j})

(r being the squared residuals sum of a regime segment). Thus, the actual regime shift dates are the ones that globally minimize the function

({\hat{i}}_{1}, \dots, {\hat{i}}_{m}) = \underset{{\hat{i}}_{1}, \dots, {\hat{i}}_{m}}{\arg \min} R (i_{1}, \dots, i_{m})

[63]. Due to the computational complexity of the problem when the number of segments is large, Bai and Perron [63] proposed a dynamic programming method, which suggests the optimal segmentation to occur for

R (M_{m, n}) = \min_{m n_{h} \leq i \leq n - n_{h}} [R (M_{m - 1, i}) + r (i + 1, n)]

. They also suggested a method for finding confidence intervals. Their method puts minimal restrictions on the distributions of the data and the residuals and is thus suitable for varying cases. As the regime segments are found, the potential determinants can then be evaluated for their role in their formation. The same statistical tests hold in this case as well. The only difference is that a determinant variable is introduced based on its rank and then tested for the regime shifts using the above procedure. If the tests fail to find an earlier shift, it is implied that the introduced determinant has played a role in its formation. The same procedure is repeated every time a determinant variable is introduced. When all potential determinants are tested, the fitted models are analyzed for their scale of impact on

p_{i}^{'}

.

4. Case Studies

In this section, we analyze three cases which are of technical, socio-economic and socio-technical nature, respectively. The first case involves a steel rod cutting process, producing several rods per day of a particular design specification. The objective is to discover if and when the process systematically deviates (perhaps due to tool breakage) from its quality objective, i.e., a specified rod diameter. The second case involves the oil price generation process. Here, the mining task aims to discover all regime shifts in the oil price data and statistically test if deaths caused by COVID-19 produced any regime changes. Finally, we present the case of deaths caused by road accidents in a state of a Middle Eastern country. The state underwent a road safety program, and the government wanted to evaluate whether three of its key initiatives made any significant impact in improving the existing situation.

4.1. Case 1: Manufacturing Quality

The first case involves a technological process. More precisely, we evaluate the output of a turning process, which originates from a simple lathe machine system. This system comprises several interacting components (or things) that include structural body parts, a cutting tool, a motor and a steel rod vised and rotated for cutting. Although several basic and emergent properties can be identified for this system, the dependent property of interest is the rod diameter that changes when the turning process is enacted. The aim of the process is to reduce the diameter of the rod to a specified level (Figure 4). This process is continuously repeated to manufacture several hundred rods. After a rod is produced, the reduced diameter is logged and checked to ensure its acceptability. As the process is prone to tool breakage (tool length being an internal determinant property), which causes a sudden increase in the diameters being produced. The intention is to catch such a regime shift in the diameter data, followed by the needed tool replacement. We ignore any external environmental determinants (e.g., room temperature) in this analysis for simplicity.

This case is selected mainly for two reasons. First, it is a simple technological process that is easily visualizable, involving just one dependent property, i.e., the diameter of the rod, and one independent internal process determinant, i.e., the tool length. Second, as we exactly know when the tool was broken (nearly 5 mm tool tip lost when the 80th rod was processed), causing a regime change in our diameter log at the corresponding time, we use this data to validate the efficacy of the methodology.

The time series event log for this case is presented in Figure 5, which reports data for 120 steel rods produced in a sequence. The intended reduced diameter is 100 mm. After the occurrence of tool breakage, an increase in the diameter at the 80th rod can be seen. As the tool continues to be used, an increase of approximately 5 mm is clearly visible in the diameter of the rest of the rods, showing the same breakage length as that which happened with the tool. For this problem, the determinant data was tool length, as recorded after every cutting run.

To test and validate the proposed methodology, the criterion used is the level basis, in which the minimal base model via Equation (1a) and Equation (1b) is simply reduced to

p_{i}^{'} = ε_{i}

, where

p_{i}^{'}

represents the diameter of the rod. Using this minimal base model, we tested the rod diameter time series data for regime changes. The results from both statistical tests are shown in Figure 6a,b, respectively, which clearly show a deviation and a peak well outside the critical band (indicated by red lines) at around the 80th data point, indicating at least one regime shift.

We then moved to the steps for dating the shift(s) and producing the corresponding segmented base minimal models. The results for both these aspects are shown in Figure 7, Table 1 and Table 2. First, it indicates that at a confidence level of 97.5%, the shift occurred between rod number 79 and 81 (most likely at the 80th rod). Second, it shows that two data segments are created, i.e., the first time series segment covering rods 1 to 79 and the second segment covering rods 80 to 120. The fitted model segments (Table 2) are also graphically depicted in Table 1, which shows the average diameter in the first segment to be around 100.05 mm and 105.12 in the second segment. A single level-model fitting the whole data is also shown (grey dashed line) in the figure to demonstrate why it is important to identify various regimes and have separate models for each.

We then evaluated the role of the tooltip length and any breakage in forming the new regime. Accordingly, the minimal model is revised to include tooltip length as an internal determinant variable. The base model changes to

p_{i}^{'} = d_{i} α_{i} + ε_{i}

, where d_i (or ToolTip—the variable name) is the new independent variable for the tooltip length. The updated model is tested again. The statistical tests failed to detect the regime change found earlier (Figure 8a,b). The details of the new fitted model are shown in Table 3, showing that the ToolTip variable is highly significant (Pr(ToolTip) > |z|) < 2 × 10⁻¹⁶ ***). The graphical plot of the fitted model with no regime changes is shown in Figure 9.

4.2. Case 2: Gasoline Price and Impact of COVID-19

The second case involves the process of crude oil spot price generation. This process is generated from a complex socio-economic system involving oil producers, transporters, governments, end-users and several other factors as its components or things. It also includes a complex interaction between these things, which may be in the form of politics, trade and conflicts. The process is socially and economically complex, in which many identified or unidentified process determinants lead to unclear system boundaries and its environment. For now, we treat this system as a blackbox, in which the main output or dependent variable of interest is the weekly WTI oil spot prices. The event log analyzed is for the period of 1 January 2015 to 31 December 2021 (Figure 10), which is obtained via the U.S. Energy Information Administration website (https://www.eia.gov/ accessed on 1 February 2022).

The aim of this mining exercise is twofold. First, we seek to identify all regime shifts. Second, we aim to evaluate the role of COVID-19 deaths in the oil price formation out of many determinants. As the planning horizon is long, we can safely assume several time-dependent determinants, such as population and industrial growth. Accordingly, the minimal base model used has both the level and the trend components, i.e., we use

p_{i}^{'} = t_{i}^{} α_{i} + ε_{i}

as suggested in Equation (2a) and Equation (2b). This model, as we recall, has a proxy time index term to capture any trend in the data. We thus used this minimal base model and tested the price data for regime changes. For brevity, we directly present the graphical results in Figure 11. The results indicate twelve regime shifts during the analysis period, the location and confidence intervals of which are indicated in Table 4.

To test whether the deaths caused by COVID-19 impacted the formation of any of these breaks, we used the COVID deaths log (Figure 12) obtained from Our World in Data (https://ourworldindata.org/ accessed on 1 February 2022). We tested the model again with the added determinant variable for COVID-19 and found one regime shift located at week 262 not recognized by the statistical tests. This week corresponds to the second week of January 2020, in which a sharp rise in deaths started to appear. The refitted model with one less segment is shown in Figure 13. Similarly, other determinants can be tested for regime changes.

4.3. Case 3: Deaths in Road Accidents and Impact of the Safety Program

The third case belongs to the road traffic system, which can be classified as a socio-technical system. This system involves several things, including a road network infrastructure, road safety rules, cars and drivers, among others. Several environmental factors may be important, including weather, etc. We are interested in mining the process generating major road accidents leading to deaths for this system. We mined ten years of a log obtained for a major middle eastern region (Figure 14). The main determinants considered are road safety measures implemented during the same periods. These measures include strict seatbelt penalty laws, which came into full effect by January 2015. Following this, cameras for detecting red lights running were introduced in March 2015. Finally, automatic detection cameras with heavy penalties for speeding were installed by 1 March 2017.

The mining objectives are again twofold, i.e., determining whether there are any positive or negative regime changes in deaths in road accidents; and whether the safety measures played any role in enhancing the existing traffic safety situation. For this case, as compared to the first two, we are dealing with count data that may be following Poisson or negative binomial distribution, depending upon mean vs. variance present in the data. Accordingly, we found that the mean = 60.25 deaths/month and that the variance = 406.91, and thus we considered a generalized linear regression model with a negative binomial distribution. Considering both the level and trend, the minimal form of the model turns out to be

\log (p_{i}^{'}) = t_{i}^{} α_{i} + ε_{i}

(as suggested via Equation (2a) and Equation (2b)). The model was tested for regime changes, and the results are shown in Figure 15. The results show two regime changes, where, in the first regime, a sharp rise in deaths is evident, followed by the lowering of the death rate in the second regime. Finally, we clearly see a negative trend in the third regime.

Since the multiple new safety measures were introduced at varying times, we ranked these measures based on their occurrences and tested them in the same sequence. The results of this incremental procedure are shown in Table 5.

The first iteration shows the results for the base model with level and trend only. Both the level and the trend term turn out to be significant in all three segments. When the seatbelt variable was introduced, it did not cause any regime shifts. This determinant seems to have a mild effect only in the second regime segment. A similar outcome is evident in the case of red-light cameras. However, when speed cameras were introduced, the second shift turned out to be undetected, showing that the last segment’s negative trend is caused by this measure.

5. Discussion

Regime shift detection is paramount for real-world processes, such as financial and economic planning and manufacturing, in making operational and strategic decisions. However, as highlighted in Section 2, the extant literature is primarily focused on well-defined processes, lacking the consideration of external factors. Additionally, we found a consistent, though implicit assumption that these processes’ overall underlying behavioral mechanism remains unchanged. This assumption is highly questionable, as shifts in such mechanisms do frequently occur, as exemplified in cases discussed in this paper (Section 1 and Section 4). Thus, our work addresses this significant shortcoming in the process mining literature through a novel and generally applicable approach that detects the presence of shifts in the process behavior and their locations and causal determinants. This vital contribution significantly adds to the process mining literature in all three dimensions of discovery, conformance and enhancement. Although its role in process discovery is obvious, the proposed methodology is equally applicable to process enhancement agendas in terms of its behavioral shift conforming to intended objectives.

This methodology was employed to analyze three distinct cases of technological, socio-economic and socio-technical nature. The varying nature of the cases demonstrates the applicability of the methodology in broader contexts. The analysis results show regime changes in all three cases, and various determinants were identified and analyzed for their role in the formation of these different regimes. In the first case of a manufacturing process, a regime shift was found where the tool breakage happened. In the second case of spot oil prices, we found twelve regime shifts, out of which the ninth regime shift turned out to be due to the occurrence of COVID-19. Finally, in the third road accident mortality case, we found that speed cameras have the most significant effect in reducing the occurrence of deaths in road accidents.

This work is useful for industry practitioners to detect such shifts in processes under their supervision to fix them based on identified determinants. For academics, this methodology can allow them to take a new perspective, wherein their analysis of processes vsn be evaluated on a more realistic segmented view rather than as a whole.

6. Conclusions

In this paper, we presented a novel methodology that is used to identify regime shifts in processes of varied nature. Despite the evidence of the use of process mining on a broader set of applications, we observed that its use is limited to well-defined processes that are centrally managed, isolated from their wider socio-economic environments, and which behave mechanistically. Hence, there is an evident lack of attention to complex socio-economic or socio-technical processes. This motivated us to develop a generally applicable methodology that firstly focuses on identifying behavioral regime shifts from the process event logs. Secondly, the methodology extends further in providing an approach that allows identifying and statistically relating determinants in forming these regime shifts. Thirdly, significant determinants are analyzed for their impact on process output. We have demonstrated the application and use of this methodology via three case studies, in which the importance and criticality of detecting behavioral shifts in terms of understanding or controlling the processes are highlighted.

As a major limitation, the proposed methodology is currently developed only for a single dependent variable scenario. In the future, it may be extended to a point in which multiple dependent variables are simultaneously considered. Furthermore, other modeling bases (e.g., volatility or AI-based) need to be integrated into the methodology. Moreover, this methodology can be developed for composite modeling bases. Another shortcoming of this work is that the methodology was tested only on three cases. A more extensive set of cases of varied nature needs to be investigated to fully justify and evaluate its performance. Finally, the proposed approach is ad hoc to problem application, as the choice of determinants is contingent upon the problem being addressed and thus needs to be treated accordingly.

Author Contributions

Conceptualization, A.W.S. and S.A.R.; methodology, A.W.S.; software, A.W.S.; validation, S.A.R.; data curation, A.W.S. and S.A.R.; writing—original draft preparation, A.W.S. and S.A.R.; writing—review and editing, S.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets used are from publicly available sources.

Conflicts of Interest

The authors declare no conflict of interest.

References

Klocke, F.; Kuchle, A. Manufacturing Processes; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
Tariq, Z.; Khan, N.; Charles, D.; McClean, S.; McChesney, I.; Taylor, P. Understanding contrail business processes throughhierarchical clustering: A multi-stage framework. Algorithms 2020, 13, 244. [Google Scholar] [CrossRef]
Wand, Y.; Weber, R. On the deep structure of information systems. Inf. Syst. J. 1995, 5, 203–223. [Google Scholar] [CrossRef]
Duffuaa, S.O.; Siddiqui, A.W. Process targeting with multi-class screening and measurement error. Int. J. Prod. Res. 2003, 41, 1373–1391. [Google Scholar] [CrossRef]
Siddiqui, A.W.; Ben-Daya, M. Reliability centered maintenance. In Handbook of Maintenance Management and Engineering; Springer: London, UK, 2009; pp. 397–415. [Google Scholar]
Siddiqui, A.W.; Basu, R. An empirical analysis of relationships between cyclical components of oil price and tanker freight rates. Energy 2020, 200, 117494. [Google Scholar] [CrossRef]
Raza, S.A.; Siddiqui, A.W. Discovering COVID-19 Induced Shifts in Refined Petroleum Products Demand: A Sequence-based Time Series Mining Approach. In Proceedings of the SmartWorld-2021|The 7th IEEE Smart World Congress, Atlanta, CA, USA, 18–21 October 2021. [Google Scholar]
Siddiqui, A.; Verma, M. A CVaR Approach to Planning Crude Oil Tanker Fleet. In Proceedings of the Manufacturing and Service Operations Management (MSOM) Conference, Toronto, ON, Canada, 29–30 June 2015. [Google Scholar]
Raza, S.A.; Siddiqui, A.W.; Standing, C. Exploring systemic problems in IS adoption using critical systems heuristics. Syst. Pract. Action Res. 2019, 32, 125–153. [Google Scholar] [CrossRef]
Raza, S.A.; Standing, C. Towards a systemic model on information systems’ adoption using critical systems thinking. J. Syst. Inf. Technol. 2010, 12, 196–209. [Google Scholar] [CrossRef]
Raza, S.A. Managing ethical requirements elicitation of complex socio-technical systems with critical systems thinking: A case of course-timetabling project. Technol. Soc. 2021, 66, 101626. [Google Scholar] [CrossRef]
Siddiqui, A.W.; Raza, S.A. A general ontological timetabling-model driven metaheuristics approach based on elite solutions. Expert Syst. Appl. 2021, 170, 114268. [Google Scholar] [CrossRef]
Mannering, F.L.; Bhat, C.R. Analytic methods in accident research: Methodological frontier and future directions. Anal. Methods Accid. Res. 2014, 1, 1–22. [Google Scholar] [CrossRef]
Raza, S.A. A paradigm shift to ethical decision-making—Incorporating systemic epistemology into complex socio-technical decision support systems research. J. Decis. Syst. 2022, 1–24. [Google Scholar] [CrossRef]
Zerbino, P.; Stefanini, A.; Aloini, D. Process science in action: A literature review on process mining in business management. Technol. Forecast. Soc. Chang. 2021, 172, 121021. [Google Scholar] [CrossRef]
Weijters, A.; Ribeiro, J.T.S. Flexible heuristics miner (FHM). In Proceedings of the 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 11–15 April 2011; pp. 310–317. [Google Scholar]
Leemans, S.J.; Fahland, D.; Van Der Aalst, W.M. Discovering block-structured process models from event logs containing infrequent behaviour. In Proceedings of the International Conference on Business Process Management, Beijing, China, 26–30 August 2013; Springer: Cham, Switzerland, 2013; pp. 66–78. [Google Scholar]
Leemans, S.J.; Fahland, D.; Van der Aalst, W.M. Scalable process discovery and conformance checking. Softw. Syst. Model. 2018, 17, 599–631. [Google Scholar] [CrossRef] [PubMed]
Marrs, T. Introduction to Matrix Profiles: A Novel Data Structure for Mining Time Series. 2019. Available online: https://towardsdatascience.com/introduction-to-matrix-profiles-5568f3375d90 (accessed on 1 February 2022).
Siddiqui, A.W. An ontological process modelling framework for stochastic systems. Int. J. Gen. Syst. 2016, 45, 803–814. [Google Scholar] [CrossRef]
Cook, J.E.; Wolf, A.L. Automating process discovery through event-data analysis. In Proceedings of the 1995 17th International Conference on Software Engineering, Seattle, WA, USA, 23–30 April 1995; p. 73. [Google Scholar]
Cook, J.E.; Wolf, A.L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. (TOSEM) 1998, 7, 215–249. [Google Scholar] [CrossRef]
Dos Santos Garcia, C.; Meincheim, A.; Faria, E.R., Jr.; Dallagassa, M.R.; Sato, D.M.V.; Carvalho, D.R.; Santos, E.A.P.; Scalabrin, E.E. Process mining techniques and applications—A systematic mapping study. Expert Syst. Appl. 2019, 133, 260–295. [Google Scholar] [CrossRef]
Macak, M.; Daubner, L.; Sani, M.F.; Buhnova, B. Process mining usage in cybersecurity and software reliability analysis: A systematic literature review. Array 2021, 13, 100120. [Google Scholar] [CrossRef]
Bernardi, S.; Trillo-Lado, R.; Merseguer, J. Detection of integrity attacks to smart grids using process mining and time-evolving graphs. In Proceedings of the 2018 14th European Dependable Computing Conference (EDCC), Iasi, Romania, 10–14 September 2018; pp. 136–139. [Google Scholar]
Bernardi, M.L.; Cimitile, M.; Distante, D.; Martinelli, F.; Mercaldo, F. Dynamic malware detection and phylogeny analysis using process mining. Int. J. Inf. Secur. 2019, 18, 257–284. [Google Scholar] [CrossRef]
Myers, D.; Radke, K.; Suriadi, S.; Foo, E. Process discovery for industrial control system cyber attack detection. In Proceedings of the IFIP International Conference on ICT Systems Security and Privacy Protection, Rome, Italy, 29–31 May 2017; Springer: Cham, Switzerland, 2017; pp. 61–75. [Google Scholar]
Sahlabadi, M.; Muniyandi, R.C.; Shukur, Z. Detecting abnormal behavior in social network websites by using a process mining technique. J. Comput. Sci. 2014, 10, 393. [Google Scholar] [CrossRef] [Green Version]
Compagna, L.; dos Santos, D.R.; Ponta, S.E.; Ranise, S. Aegis: Automatic enforcement of security policies in workflow-driven web applications. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, Scottsdale, AZ, USA, 22–24 March 2017; pp. 321–328. [Google Scholar]
Bernardi, S.; Alastuey, R.P.; Trillo-Lado, R. Using process mining and model-driven engineering to enhance security of web information systems. In Proceedings of the 2017 IEEE European Symposium on Security and Privacy Workshops (EuroS & PW), Paris, France, 26–28 April 2017; pp. 160–166. [Google Scholar]
Verbeek, H.; Buijs, J.; Van Dongen, B.; van der Aalst, W.M. Prom 6: The process mining toolkit. Proc. BPM Demonstr. Track 2010, 615, 34–39. [Google Scholar]
Leppäkoski, A.; Hämäläinen, T.D. PROMOTE: A Process Mining Tool for Embedded System Development. In Proceedings of the International Conference on Product-Focused Software Process Improvement, Trondheim, Norway, 22–24 November 2016; Springer: Cham, Switzerland, 2016; pp. 529–538. [Google Scholar]
Gupta, M.; Serebrenik, A.; Jalote, P. Improving software maintenance using process mining and predictive analytics. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, China, 17–22 September 2017; pp. 681–686. [Google Scholar]
Rubin, V.A.; Mitsyuk, A.A.; Lomazova, I.A.; van der Aalst, W.M. Process mining can be applied to software too! In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Torino, Italy, 18–19 September 2014; pp. 1–8. [Google Scholar]
Xu, X.; Zhu, L.; Weber, I.; Bass, L.; Sun, D. POD-Diagnosis: Error diagnosis of sporadic operations on cloud applications. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, USA, 23–26 June 2014; pp. 252–263. [Google Scholar]
Lübke, D. Extracting and conserving production data as test cases in executable business process architectures. Procedia Comput. Sci. 2017, 121, 1006–1013. [Google Scholar] [CrossRef]
Ciccarese, P.; Caffi, E.; Boiocchi, L.; Halevy, A.; Quaglini, S.; Kumar, A.; Stefanelli, M. The NewGuide Project: Guidelines, information sharing and learning from exceptions. In Proceedings of the Conference on Artificial Intelligence in Medicine in Europe, Protaras, Cyprus, 18–22 October 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 163–167. [Google Scholar]
Günther, C.W.; Van Der Aalst, W.M. Fuzzy mining–adaptive process simplification based on multi-perspective metrics. In Proceedings of the International Conference on Business Process Management, Brisbane, Australia, 24–28 September 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 328–343. [Google Scholar]
Li, M.; Liu, L.; Yin, L.; Zhu, Y. A process mining based approach to knowledge maintenance. Inf. Syst. Front. 2011, 13, 371–380. [Google Scholar] [CrossRef]
Alvarez, C.; Rojas, E.; Arias, M.; Munoz-Gama, J.; Sepúlveda, M.; Herskovic, V.; Capurro, D. Discovering role interaction models in the Emergency Room using Process Mining. J. Biomed. Inform. 2018, 78, 60–77. [Google Scholar] [CrossRef] [PubMed]
Rojas, E.; Sepúlveda, M.; Munoz-Gama, J.; Capurro, D.; Traver, V.; Fernandez-Llatas, C. Question-driven methodology for analyzing emergency room processes using process mining. Appl. Sci. 2017, 7, 302. [Google Scholar] [CrossRef] [Green Version]
Basole, R.C.; Park, H.; Gupta, M.; Braunstein, M.L.; Chau, D.H.; Thompson, M. A visual analytics approach to understanding care process variation and conformance. In Proceedings of the 2015 Workshop on Visual Analytics in Healthcare, Chicago, IL, USA, 25 October 2015; pp. 1–8. [Google Scholar]
Trcka, N.; Pechenizkiy, M.; van der Aalst, W. Process Mining from Educational Data (Chapter 9). In Handbook of Educational Data Mining; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Okoye, K.; Tawil, A.-R.H.; Naeem, U.; Bashroush, R.; Lamine, E. A semantic rule-based approach supported by process mining for personalised adaptive learning. Procedia Comput. Sci. 2014, 37, 203–210. [Google Scholar] [CrossRef]
Groba, A.R.; Barreiros, B.V.; Lama, M.; Gewerc, A.; Mucientes, M. Using a learning analytics tool for evaluation in self-regulated learning. In Proceedings of the 2014 IEEE Frontiers in Education Conference (FIE) Proceedings, Madrid, Spain, 22–25 October 2014; pp. 1–8. [Google Scholar]
Van Der Aalst, W. Process mining: Overview and opportunities. ACM Trans. Manag. Inf. Syst. (TMIS) 2012, 3, 1–17. [Google Scholar] [CrossRef]
Cho, M.; Song, M.; Comuzzi, M.; Yoo, S. Evaluating the effect of best practices for business process redesign: An evidence-based approach based on process mining techniques. Decis. Support Syst. 2017, 104, 92–103. [Google Scholar] [CrossRef]
Syamsiyah, A.; Bolt, A.; Cheng, L.; Hompes, B.F.; Jagadeesh Chandra Bose, R.; van Dongen, B.F.; van der Aalst, W.M. Business process comparison: A methodology and case study. In Proceedings of the International Conference on Business Information Systems, Poznan, Poland, 28–30 June 2017; Springer: Cham, Switzerland, 2017; pp. 253–267. [Google Scholar]
Linoff, G.S.; Berry, M.J. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Măruşter, L.; van Beest, N.R. Redesigning business processes: A methodology based on simulation and process mining techniques. Knowl. Inf. Syst. 2009, 21, 267–297. [Google Scholar] [CrossRef] [Green Version]
Roldán, J.J.; Olivares-Méndez, M.A.; del Cerro, J.; Barrientos, A. Analyzing and improving multi-robot missions by using process mining. Auton. Robot. 2018, 42, 1187–1205. [Google Scholar] [CrossRef] [Green Version]
Ruschel, E.; Santos, E.A.P.; Loures, E.d.F.R. Mining shop-floor data for preventive maintenance management: Integrating probabilistic and predictive models. Procedia Manuf. 2017, 11, 1127–1134. [Google Scholar] [CrossRef]
Paszkiewicz, Z. Process mining techniques in conformance testing of inventory processes: An industrial application. In Proceedings of the International Conference on Business Information Systems, Poznan, Poland, 19–20 June 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 302–313. [Google Scholar]
Sutrisnowati, R.A.; Bae, H.; Song, M. Bayesian network construction from event log for lateness analysis in port logistics. Comput. Ind. Eng. 2015, 89, 53–66. [Google Scholar] [CrossRef]
Repta, D.; Dumitrache, I.; Sacala, I.S.; Moisescu, M.A.; Stanescu, A.M.; Caramihai, S.I. Automated process recognition architecture for cyber-physical systems. Enterp. Inf. Syst. 2018, 12, 1129–1148. [Google Scholar] [CrossRef]
Jans, M.; Alles, M.G.; Vasarhelyi, M.A. A field study on the use of process mining of event logs as an analytical procedure in auditing. Account. Rev. 2014, 89, 1751–1773. [Google Scholar] [CrossRef] [Green Version]
Outmazgin, N.; Soffer, P. Business process workarounds: What can and cannot be detected by process mining. In Enterprise, Business-Process and Information Systems Modeling; Springer: Berlin/Heidelberg, Germany, 2013; pp. 48–62. [Google Scholar]
Reijers, H.A.; Song, M.; Jeong, B. On the performance of workflow processes with distributed actors: Does place matter? In Proceedings of the International Conference on Business Process Management, Brisbane, Australia, 24–28 September 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 32–47. [Google Scholar]
Fleig, C.; Augenstein, D.; Maedche, A. Designing a process mining-enabled decision support system for business process standardization in ERP implementation projects. In Proceedings of the International Conference on Business Process Management, Sydney, Australia, 9–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 228–244. [Google Scholar]
Bunge, M. Treatise on Basic Philosophy: Ontology I. Ontology I: The Furniture of the World; Reidel: Boston, MA, USA, 1977; Volume 3. [Google Scholar]
Bunge, M. Treatise on Basic Philosophy: Ontology II. Ontology II: A world of Systems; Reidel: Boston, MA, USA, 1979; Volume 4. [Google Scholar]
Bai, J.; Perron, P. Estimating and testing linear models with multiple structural changes. Econometrica 1998, 66, 47–78. [Google Scholar] [CrossRef]
Bai, J.; Perron, P. Computation and analysis of multiple structural change models. J. Appl. Econ. 2003, 18, 1–22. [Google Scholar] [CrossRef] [Green Version]
Zeileis, A.; Kleiber, C.; Krämer, W.; Hornik, K. Testing and dating of structural changes in practice. Comput. Stat. Data Anal. 2003, 44, 109–123. [Google Scholar] [CrossRef] [Green Version]
Andrews, D.W. Tests for parameter instability and structural change with unknown change point. Econ. J. Econ. Soc. 1993, 61, 821–856. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Examples of process regime shifts in (a) level and (b) trend of the output event log time series data.

Figure 2. A General System Representation.

Figure 3. Procedure to find regime shifts and related determinants.

Figure 4. Rod diameter reduction via turning manufacturing process.

Figure 5. Steel rod diameter data.

Figure 6. Statistical test results (minimal base model). (a) Empirical Fluctuation (b) F-Statistics.

Figure 7. Break date and segmented model.

Figure 8. Statistical test results (updated base model). (a) Empirical Fluctuation (b) F-Statistics.

Figure 9. Fitted model with determinant variable and no regime change detected.

Figure 10. WTI Spot Price (Dollars/Barrel).

Figure 11. Regime Changes in WTI spot price data (minimal base model).

Figure 12. Worldwide Deaths caused by COVID-19 (until 31 December 2021).

Figure 13. Regime Changes in WTI spot price data (base model with COVID-19 variable).

Figure 14. Road accident deaths data for December 2009–December 2019.

Figure 15. Road accident death data (minimal model).

Table 1. Regime shift points.

Regime Shifts at Observation Number: 2.5% Shifts 97.5%
Seg.	Lower Limit	Term at Which the Shift Occurred	Upper Limit
1	79	80	81

Table 2. Fitted segmented model details (minimal base model (level)).

Models	Model Fit Details Significance Codes: 0 ‘*’\|0.001 ‘’\|0.01 ‘*’\|0.05 ‘.’\|0.1 ‘ ’\|1
Models	Regime Period	Term	Estimate	Stand. Error	t-Value	Pr (>\|z\|) (Term)	Pr (>\|z\|) (Regime)
y~level	1–79	Level	100.0	0.05	1988	<2 × 10⁻¹⁶ ***	<22 × 10⁻¹⁶ ***
y~level	80–120	Level	105.12	0.07	1477	<22 × 10⁻¹⁶ ***	<22 × 10⁻¹⁶ ***

Table 3. Fitted Model details (Updated base model).

Model	Model Fit Coefficients (with Breaks) Significance Codes: 0 ‘*’\|0.001 ‘’\|0.01 ‘*’\|0.05 ‘.’\|0.1 ‘ ’\|1
Model	Regime Period	Term	Est.	SE	t-Value	Pr (>\|z\|) (Term)	Pr (>\|t\|) (Regime)
y~level + ToolTip	1–15	Level	202.4	118.07	118.07	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
y~level + ToolTip	1–15	ToolTip	−1.03	0.02	−58.76	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***

Table 4. Break dates (minimal base model (trend)).

Regime Shifts at Observation Number: 2.5% Shifts 97.5%
Seg.	Lower Limit	Term at Which the Shift Occurred	Upper Limit
1	26	27	29
2	53	54	55
3	78	79	81
4	99	102	103
5	127	128	129
6	173	175	176
7	201	202	203
8	229	230	236
9	261	262	263
10	279	280	281
11	297	298	299
12	340	343	344

Table 5. Fitted segmented model details (minimal base model (Trend)).

Models	Model Fit Details Significance Codes: 0 ‘*’\|0.001 ‘’ \|0.01 ‘*’\|0.05 ‘.’\|0.1 ‘ ’\|1
Models	Regime Period	Term	Estimate	Stand. Error	z-Value	Pr (>\|z\|) (Term)	Pr (>\|z\|) (Regime)
y~level + time	1–15	Level	2.79	0.18	15.29	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
	1–15	time	0.12	0.02	6.28	3.46 × 10⁻¹⁰ ***	<2 × 10⁻¹⁶ ***
	15–67	Level	3.95	0.09	45.38	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
	15–67	time	0	0	1.8	0.0713.	<2 × 10⁻¹⁶ ***
	67–120	Level	5.28	0.16	33.21	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
	67–120	time	−0.01	0	−6.79	1.12 × 10⁻¹¹ ***	<2 × 10⁻¹⁶ ***
y~level + time + Seatbelts	1–15	Level	2.79	0.18	15.29	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		time	0.12	0.02	6.28	3.46 × 10⁻¹⁰ ***
		Seatbelts	-	-	-	-
	15–67	Level	3.98	0.1	41.55	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		time	0	0	1.15	0.25
		Seatbelts	0.01	0.01	0.66	0.51
	67–120	Level	5.28	0.16	33.21	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		time	−0.01	0	−6.79	1.12 × 10⁻¹¹ ***
		Seatbelts	-	-	-	-
y~level + time + Seatbelts + Red light	1–15	Level	2.79	0.18	15.29	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		time	0.12	0.02	6.28	3.46 × 10⁻¹⁰ ***
		Seatbelts	-	-	-	-
		Red-light	-	-	-	-
	15–81	Level	3.97	0.09	44.73	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		Time	0	0	1.31	0.19
		Seatbelts	0.01	0.01	0.77	0.44
		Red-light	0.01	0.01	1.42	0.16
	81–120	Level	5.55	0.29	19.29	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		time	−0.01	0	−4.93	8.25 × 10⁻⁷ ***
		Seat-belts	-	-	-	-
		Red-light	-	-	-	-
y~c +time + Seatbelts + Red light + Speed	1–15	Level	2.79	0.18	15.29	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		Time	0.12	0.02	6.28	3.46 × 10⁻¹⁰ ***
		Seatbelts	-	-	-	-
		Red-light	-	-	-	-
		Speed	-	-	-	-
	15–120	Level	4.14	0.08	51.35	<2 × 10⁻¹⁶ ***	<2 × 10⁻¹⁶ ***
		Time	0	0	−0.72	0.47
		Seatbelts	0.02	0.01	1.73	0.084.
		Red-light	0.02	0.01	1.67	0.095.
		Speed	−0.03	0.01	−3.43	0.0006 ***

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Siddiqui, A.W.; Raza, S.A. A Statistical Approach to Discovering Process Regime Shifts and Their Determinants. Algorithms 2022, 15, 127. https://doi.org/10.3390/a15040127

AMA Style

Siddiqui AW, Raza SA. A Statistical Approach to Discovering Process Regime Shifts and Their Determinants. Algorithms. 2022; 15(4):127. https://doi.org/10.3390/a15040127

Chicago/Turabian Style

Siddiqui, Atiq W., and Syed Arshad Raza. 2022. "A Statistical Approach to Discovering Process Regime Shifts and Their Determinants" Algorithms 15, no. 4: 127. https://doi.org/10.3390/a15040127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Statistical Approach to Discovering Process Regime Shifts and Their Determinants

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Process Modeling and Mapping

3.2. Algorithm and Statistical Tests

Statistical Modeling and Testing

4. Case Studies

4.1. Case 1: Manufacturing Quality

4.2. Case 2: Gasoline Price and Impact of COVID-19

4.3. Case 3: Deaths in Road Accidents and Impact of the Safety Program

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI