The main role of this feature is to allow users to have in one “indicator” the aggregation of other states. This indicator can provide a unique view for users focused on different roles.
Typical roles:
- Service delivery Management
- Business Management
- Engineering
- IT support
Let’s take a simple example of a service delivery role for an ERP application. It mainly consists of the following IT components:
- 2 databases, in high availability, so with one database active, the service is considered up
- 2 web servers, in load sharing, so with one web server active, the service is considered up
- 2 load balancers, again in high availability
These IT components (Hosts in this example) will be the basis for the ERP service.
With business rules, you can have an “indicator” representing the “aggregated service” state for the ERP service! Shinken already checks all of the IT components one by one including processing for root cause analysis from a host and service perspective.
It’s a simple service (or a host) with a “special” check_command named bp_rule. :)
Important
Common gotcha: Host status always resolve to either Up, Critical or Unknown. If a host has a warning result from the Host check, it will either make it Critical or Up depending on the Host configuration. A host cannot be in a warning state. Only services can resolve to Up Critical, Warning and Unknown. Learn more about Host and Service states
This makes it compatible with all your current habits and UIs. As the service aggregation is considered as any other state from a host or service, you can get notifications, actions and escalations. This means you can have contacts that will receive only the relevant notifications based on their role.
Warning
You do not have to define “bp_rule” command, it’s purely internal. You should NOT define it in you checkcommands.cfg file, or the configuration will be invalid due to duplicate commands!
Here is a configuration for the ERP service example, attached to a dummy host named “servicedelivery”.
define service{
use standard-service
host_name servicedelivery
service_description ERP
check_command bp_rule!(h1,database1 | h2,database2) & (h3,Http1 | h4,Http4) & (h5,IPVS1 | h6,IPVS2)
}
That’s all!
Note
A complete service delivery view should include an aggregated view of the end user availability perspective states, end user performance perspective states, IT component states, application error states, application performance states. This aggregated state can then be used as a metric for Service Management (basis for defining an SLA).
Warning
From now the business rules manage only one level of () recursivity. Please look at the ticket for more details.
In some cases, you know that in a cluster of N elements, you need at least X of them to run OK. This is easily defined, you just need to use the “X of:” operator.
Here is an example of the same ERP but with 3 http web servers, and you need at least 2 of them (to handle the load):
define service{
use standard-service
host_name servicedelivery
service_description ERP
check_command bp_rule!(h1,database1 | h2,database2) & (2 of: h3,Http1 & h4,Http4 & h5,Http5) & (h6,IPVS1 | h7,IPVS2)
}
It’s done :)
You can define a not state rule. It can be useful for active/passive setups for example. You just need to add a ! before your element name.
Example:
define service{
use generic-service
host_name servicedelivery
service_description Cluster_state
check_command bp_rule!(h1,database1 & !h2,database2)
}
Aggregated state will be okay if database1 is okay and database2 is warning or critical (stopped).
In the Xof: way the only case where you got a “warning” (=”degraded but not dead”) it’s when all your elements are in warning. But you should want to be in warning if 1 or your 3 http server is critical: the service is still running, but in a degraded state.
Here are some example for business rules about 5 services A, B, C, D and E. Like 5,1,1of:A|B|C|D|E
A | B | C | D | E |
Warn | Ok | Ok | Ok | Ok |
Rules and overall states:
- 4of: –> Ok
- 5,1,1of: –> Warning
- 5,2,1of: –> Ok
A | B | C | D | E |
Warn | Warn | Ok | Ok | Ok |
Rules and overall states:
- 4of: –> Warning
- 3of: –> Ok
- 4,1,1of: –> Warning
A | B | C | D | E |
Crit | Crit | Ok | Ok | Ok |
Rules and overall states:
- 4of: –> Critical
- 3of: –> Ok
- 4,1,1of: –> Critical
A | B | C | D | E |
Warn | Crit | Ok | Ok | Ok |
Rules and overall states:
- 4of: –> Critical
- 4,1,1of: –> Critical
A | B | C | D | E |
Warn | Crit | Crit | Ok | Ok |
Rules and overall states:
Let’s look at some classic setups, for MAX elements.
- ON/OFF setup: MAXof: <=> MAX,MAX,MAXof:
- Warning as soon as problem, and critical if all criticals: MAX,1,MAXof:
- Worse state: MAX,1,1