Introduction

A random walk on a graph is a process that begins at some status (vertex, node or state), and at each time step moves to another status.

When the graph edges are unweighted, the transition destination status is chosen uniformly at random among the neighbors of the current status. cases transition to a neighbor status with probability proportional to the weight of the corresponding edge. While the case traces (list of statuses in the order they are visited) of a particular random walk is sometimes of interest, it is often more productive to reason about the expected behavior of a random walk. To this end, we will investigate the probability distribution over the statuses after a certain number of steps.

In this article, we use rprom to study random walks on a process map represented by a directed graph.

Transition Matrix

We will let the vector \(\vec{p}^{t} \in \mathbb{R}^n\) denote the probability vector at time \(t\) and \(p^{(t)}_{A}\) denote the probability of being in state \(A\) at time \(t\).
The status probabilities must add up to \(1.0\).

To obtain the status probability distribution after each transition, we simply mutiply the Transition Matrix also known as Adjacency Matrix by the current probability vector: \[\vec{p}^{(t)} * A = \vec{p}^{(t+1)}\]

Get Transition Matrix from Transition System objects

Here, we will show you how to obtain te Transition(Adjacency) Matrix from an onject of class TransitionSystem.

We used dataset hospital_billing from package eventdata for this study as it contains case status data in column state. To shorten the status labels, we renamed them to their abbreviated form. You can make any changes to the eventlog history of a TransitionSystem object directly but you will need to reset the object after that:

hospital_df = eventdataR::hospital_billing %>% as.data.frame
status_mapper = c('In progress' = 'IP', 'Closed' = 'CL', 'Released' = 'RE', 'Billed' = 'B', 
                  'Invoice rejected' = 'IR', 'Billable' = 'BL', 'Empty' = 'E', 'Rejected' = 'RJ',
                  'Unbillable' = 'UB', 'Check' = 'C')
mapped_rows = which(hospital_df$state %in% intersect(hospital_df$state, names(status_mapper)))
hospital_df$state[mapped_rows] = status_mapper[hospital_df$state[mapped_rows]]

tsobj = new('TransitionSystem')
tsobj$feed.transition_events(hospital_df, 
                    caseID_col = 'case_id', 
                    status_col = 'state',
                    startTime_col = 'timestamp')
## Warning:  
##  
##  1 events removed due to missing values in column caseID 
##  
## Warning:  
##  
##  7853 events removed due to missing values in column status 
## 

Let’s have a look at the process map for this Transition System:

tsobj$plot.process.map(width = "1000px", height = "800px")
## 
##  Aggregating nodes ...Done! 
## 
##  Aggregating links ...Done!

The value of cell in row \(i\) and column \(j\) of matrix A is the probability of the walk at status \(i\) selecting the edge to status \(j\).

You can get the Transition Matrix of the process represented by a Transition System
by calling method get.transition_matrix.

 tsobj$get.transition_matrix()
##       START          IP          CL          RE            B         IR
## START     0 1.000000000 0.000000000 0.000000000 0.000000e+00 0.00000000
## IP        0 0.399034142 0.436657682 0.000000000 5.615454e-05 0.00000000
## CL        0 0.005959929 0.008369262 0.980344915 0.000000e+00 0.00000000
## RE        0 0.073587053 0.005436844 0.020230118 8.874700e-01 0.00000000
## B         0 0.000000000 0.000000000 0.000941873 2.556512e-03 0.03040904
## IR        0 0.000000000 0.000000000 0.000000000 0.000000e+00 0.01310044
## BL        0 0.024048096 0.002004008 0.004008016 7.875752e-01 0.00000000
## E         0 0.005586592 0.000000000 0.000000000 0.000000e+00 0.00000000
## RJ        0 0.857142857 0.000000000 0.100000000 0.000000e+00 0.00000000
## UB        0 0.011904762 0.000000000 0.023809524 0.000000e+00 0.00000000
## C         0 0.000000000 0.000000000 0.000000000 0.000000e+00 0.00000000
## END       0 0.000000000 0.000000000 0.000000000 0.000000e+00 0.00000000
##                 BL           E        RJ           UB            C         END
## START 0.0000000000 0.000000000 0.0000000 0.0000000000 0.0000000000 0.000000000
## IP    0.0034815813 0.009827044 0.0000000 0.0003369272 0.0000000000 0.150606469
## CL    0.0000000000 0.000000000 0.0000000 0.0002536140 0.0000000000 0.005072280
## RE    0.0001264382 0.000000000 0.0000000 0.0079656088 0.0001264382 0.005057529
## B     0.0349838536 0.000000000 0.0000000 0.0000000000 0.0000000000 0.931108719
## IR    0.6550218341 0.000000000 0.3056769 0.0262008734 0.0000000000 0.000000000
## BL    0.0521042084 0.000000000 0.0000000 0.0060120240 0.0000000000 0.124248497
## E     0.0000000000 0.022346369 0.0000000 0.0000000000 0.0000000000 0.972067039
## RJ    0.0000000000 0.000000000 0.0000000 0.0428571429 0.0000000000 0.000000000
## UB    0.0000000000 0.000000000 0.0000000 0.0119047619 0.0000000000 0.952380952
## C     0.0000000000 0.000000000 0.0000000 0.0000000000 0.0000000000 1.000000000
## END   0.0000000000 0.000000000 0.0000000 0.0000000000 0.0000000000 1.000000000

Random Walk Steps

So let’s consider one case which is in status IP (In Progress) and see how is its status likely to change in the next steps.

The initial probability vector is \(\vec{p}^{(0)} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)\) in which the second element associated with status IP is \(1\). The probability distribution after one step is:

A  = tsobj$get.transition_matrix()

p0 = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
p1 = p0 %*% A
print(p1)
##      START        IP        CL RE            B IR          BL           E RJ
## [1,]     0 0.3990341 0.4366577  0 5.615454e-05  0 0.003481581 0.009827044  0
##                UB C       END
## [1,] 0.0003369272 0 0.1506065

which is the second row of the Transition Matrix associated with status IP.
As we take further steps, the probability distribution changes. For example to see the probability distribution after 10 steps, we multiply the transition matrix 10 times recursively:

p = list(p1)
for (i in 1:9){
  p[[i + 1]] = p[[i]] %*% A
}
print(p[[10]])
##      START          IP          CL          RE           B           IR
## [1,]     0 0.002023776 0.001573445 0.002776751 0.005320415 0.0002836849
##               BL            E           RJ           UB            C       END
## [1,] 0.000693773 3.561179e-05 0.0001391668 7.287324e-05 6.183233e-07 0.9870799

TransitionSystem object has a method which computes the random walk probability distributions given the initial state and number of steps:

knitr::kable(tsobj$random_walk(starting_status = 'IP', num_steps = 10))
START IP CL RE B IR BL E RJ UB C END
0 0.3990341 0.4366577 0.0000000 0.0000562 0.0000000 0.0034816 0.0098270 0.0000000 0.0003369 0.00e+00 0.1506065
0 0.1619733 0.1779028 0.4280972 0.0027646 0.0000017 0.0015726 0.0041409 0.0000000 0.0002701 0.00e+00 0.2232767
0 0.0972598 0.0745465 0.1830819 0.3811781 0.0000841 0.0007978 0.0016843 0.0000005 0.0035225 5.41e-05 0.2577905
0 0.0527977 0.0440901 0.0772312 0.1640880 0.0115924 0.0137935 0.0009934 0.0000257 0.0015590 2.31e-05 0.6338059
0 0.0273919 0.0238711 0.0450354 0.0798262 0.0051416 0.0142460 0.0005410 0.0035435 0.0010505 9.80e-06 0.7993430
0 0.0177820 0.0124341 0.0248246 0.0513930 0.0024948 0.0070038 0.0002813 0.0015717 0.0007588 5.70e-06 0.8814503
0 0.0105227 0.0080177 0.0129436 0.0276795 0.0015955 0.0038620 0.0001810 0.0007626 0.0003908 3.10e-06 0.9340414
0 0.0059514 0.0047400 0.0082491 0.0146001 0.0008626 0.0022529 0.0001075 0.0004877 0.0002110 1.60e-06 0.9625361
0 0.0034854 0.0026877 0.0048903 0.0091328 0.0004553 0.0012149 0.0000609 0.0002637 0.0001285 1.00e-06 0.9776794
0 0.0020238 0.0015734 0.0027768 0.0053204 0.0002837 0.0006938 0.0000356 0.0001392 0.0000729 6.00e-07 0.9870799

Looking at the probabilities of each status versus the step number, we can see which statuses cases are pulled towards status END as the step number increases.

tsobj$random_walk(starting_status = 'IP', num_steps = 10) %>% 
  rvisPlot(x = list(step_no = 1:10), y = as.list(rownames(A)), plotter = 'plotly', type = 'scatter', shape = 'line.point')

Periodic Transition System

In the previous study, we could see how probability distribution among statuses change as we walk through the process graph. We could see which statuses are most likely met in each stage of the process, however this does not give us any view of how the process status changes over time because status transitions can happen at any time and durations cases spend at statuses can be very different.

Periodic Transition Systems, help us consider time in the modelling of the process. If the time interval between transitions are constant, number of steps represents time elapsed and we will be able to monitor status probabilities over time.

Periodic Transition System assumes a status transition at the start of a period of time like a day for example. As you know we can always have a transition from one status to itself, so at the beginning of each time period (day), a transition happens: if the case is still in the same status as yesterday, a same-status-transition event is triggered.

The TransitionSystem object has a method named to_periodic which converts itself into a periodic transition system and returns it as a new object. We can use that object to generate the transition matrix and observe probability distributions over time. Let’s see how this works:

ts_monthly = tsobj$to_periodic('month')
## 
##  Verifications started ... Done! 
## 
##  Managing event times ... Done! 
## 
##  Creating time ends and event attribute maps ... Done! 
## 
##  Adding clock events ... Done! 
## 
##  Creating variable last value table ... Done!
ts_monthly$random_walk(starting_status = 'IP', num_steps = 10) %>% 
  rvisPlot(x = list(month_no = 1:10), y = as.list(ts_monthly$get.statuses()), plotter = 'plotly', type = 'scatter', shape = 'line')
## 
##  Aggregating links ...Done!

Well, you can see how the probabiolity distribution is being pulled towards the END status as time increases.

In the next study, we will learn how to run a simulation to monitor the future of the process.