process_visualisation.Rmd
This article gives you an overview of some visualisations that you can generate with rprom.
We are using package eventdataR to get some sample data for visualisation. We may use various datasets from this package.
First of all, we build an empty Transition System object:
tsobj = new('TransitionSystem')
Now, we pick an eventlog from eventdataR
package and feed our object with the data:
traffic_fines_df = eventdataR::traffic_fines %>% as.data.frame()
traffic_fines_df %>% head %>% knitr::kable()
case_id | activity | lifecycle | resource | timestamp | amount | article | dismissal | expense | lastsent | matricola | notificationtype | paymentamount | points | totalpaymentamount | vehicleclass | activity_instance_id | .order |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1 | Create Fine | complete | 561 | 2006-07-24 | 350 | 157 | NIL | NA | NA | NA | NA | NA | 0 | 0.0 | A | 1 | 1 |
A1 | Send Fine | complete | NA | 2006-12-05 | NA | NA | NA | 110 | NA | NA | NA | NA | NA | NA | NA | 2 | 2 |
A100 | Create Fine | complete | 561 | 2006-08-02 | 350 | 157 | NIL | NA | NA | NA | NA | NA | 0 | 0.0 | A | 3 | 3 |
A100 | Send Fine | complete | NA | 2006-12-12 | NA | NA | NA | 110 | NA | NA | NA | NA | NA | NA | NA | 4 | 4 |
A100 | Insert Fine Notification | complete | NA | 2007-01-15 | NA | NA | NA | NA | P | NA | P | NA | NA | NA | NA | 5 | 5 |
A100 | Add penalty | complete | NA | 2007-03-16 | 715 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6 | 6 |
We can define completion of an activity as status for our cases, so we filter for lifecycle
to be complete
. You also need to specify critical column headers for the data feeder:
The first and most valuable process visualisation is a process map.
Process map is a directed graph containing nodes and edges. Each node is associated with a status and each edge denotes a transition from one status to another.
Method plot.process.map
can be used to plot the process map for the transition system:
tsobj$plot.process.map()
##
## Aggregating nodes ...Done!
##
## Aggregating links ...Done!
Each edge (link or transition) in the process map has a source status and a destination status. By default method plot.process.map
uses rate as the measure denoted by edges. On each edge, you will see the the rate (ratio) of transitions from the source status going to the destination status as a percentage of total outgoing transitions from the source status. The number denoted by each node(status) shows the total incoming transition to that status.
When you choose frequency as measure, the incoming edges of each status annotate frequencies of entries to that status while the outgoing edges annotate the exit frequencies (outgoing from that status to other statuses). This option can be chosen by setting argument measure
to freq
.
Other than frequencies in the process map, you can also use time as measure to focus on the durations of statuses and transitions. You can specify the time unit via argument time_unit
:
tsobj$plot.process.map(measure = 'time', time_unit = 'hour')
Values denoted on each status(node) show average duration of lingering in that status and values denoted on the edges show average transition times which is average duration of lingering in the source status before transiting to the destination status.
Sometimes showing all transitions in the process map, makes it look noisy and complicated. We may need to filter out some cases with unusual traces in order to have a better visualisation of the process.
The most common filtering is to filter out cases which have uncommon transitions. These are what we call noisy cases. These cases follow traces different from the most frequent traces of the process and thus deviate from the main process.
Case filtering can be done via method filter.case
. To filter out noisy cases, you can set argument FreqRateCut
to a value between 0 and 1. Usually, a value close to 1.0 is chosen. For example, if you set it to 0.9, 10% of cases with the least frequent traces will be eliminated:
tsobj$set.filter.case(freq_rate_cut = 0.9)
tsobj$plot.process.map()
##
## Aggregating nodes ...Done!
##
## Aggregating links ...Done!
When you case filter your object, all measures and charts will be impacted by the filtering until you run method filer.reset
or run filter.case
with different filtering arguments. For complete information about case filtering refer to the package API reference.
rprom currently supports two plotters for rendering process map graph: grviz
and visNetwork
. The default plotter is grviz
which is using javascript package GraphVis and is available via the R package rviz. To use plotter visNetwork
, you need to have R package visNetwork installed:
if(!require(visNetwork)){install.packages('visNetwork')}
visNetwork
plots become noisy and unreadable when the graph size is big. So it’s best to use a frequancy rate cut filtering before plotting to filter out cases with low-frequent traces and simplify the process:
tsobj$reset.filters()
tsobj$set.filter.case(freq_rate_cut = 0.8)
tsobj$plot.process.map(plotter = 'visNetwork')
##
## Aggregating nodes ...Done!
##
## Aggregating links ...Done!
## Loading required package: visNetwork
The layout of the plot generated by grvis
looks better, on the other side, visNetwork
gives you possibility of dragging the nodes or enable physics.
You can customize your plot by specifying parameters to the config
argument.
For example, for the visNetwork
plot, you can select hierarchical
layout, change direction, edge smoothing type and enable physics for nodes and edges:
tsobj$reset.plots()
customized = list(layout = 'hierarchical', node.physics.enabled = T, direction = 'left.right',
link.smooth = list(enabled = T, type = 'curvedCCW'))
tsobj$plot.process.map(plotter = 'visNetwork', config = customized, width = "800px", height = "1200px")
You can refer to the documentation of rvis
package to see how you can specify argument config
in order to customize your plot. rvis
uses a unique configuration format for all the plotters it supports. You can also directly use visNetwork
functions to customize your plot. For example you can change edge arrow sizes.
tsobj$reset.plots()
tsobj$plot.process.map(plotter = 'visNetwork') %>%
visNetwork::visEdges(arrows = list(to = list(enabled = TRUE, scaleFactor = 1.5)))
For complete instructions refer to the documentation of visNetwork
package.
Sankey chart is another type of visualisation which can show you the process by focusing more on the process flows.
tsobj %>% plot_process_sankey()
## Loading required package: networkD3
Another type of visualisation ideal for showing a process is an interactive Sankey-Tree.
To generate this visualisation, you will need to install packages sankeytreeR
and treemap
. package treemap
can be installed from cran and sankeytree
needs to be installed from github:
if(!require(treemap)){install.packages('treemap')}
## Loading required package: treemap
if(!require(d3r)){install.packages('d3r')}
## Loading required package: d3r
if(!require(sankeytreeR)){devtools::install_github('https://github.com/timelyportfolio/sankeytree.git')}
## Loading required package: sankeytreeR
tsobj %>% plot_process_tree()
##
## Aggregating traces ...Done!