This is an Aviate-only feature. |
Introduction
Aviate Health is a feature provided by the Aviate plugin. It is designed to give users valuable insights into the overall health and performance of a Kill Bill installation. As part of this feature, the aviate plugin exposes dedicated health-related endpoints that allow you to monitor and manage various aspects of your Kill Bill setup. Some of the capabilities offered by this feature include viewing detailed health metrics for all nodes within a Kill Bill installation, identifying and resolving issues such as stuck bus or notification entries, and generating comprehensive diagnostic reports to aid in troubleshooting and system optimization.
Getting Started with Aviate Health
This section provides a step-by-step approach to start using Aviate Health.
Installing the Plugin
The Aviate plugin can be installed as documented in the How to Install the Aviate Plugin doc.
Enabling Aviate Health
When the Aviate plugin is installed, Aviate Heath is enabled by default.
The following configuration property controls this feature:
com.killbill.billing.plugin.aviate.enableHealthReporter=true
Refer to the Kill Bill Configuration Guide to know more about setting configuration properties.
Using Health APIs
As mentioned earlier, Aviate Health exposes endpoints that allow monitoring the health of a KB installation and fixing problems if any. Once the aviate plugin is installed, you can start using the Aviate Health APIs. These are documented in our api docs.
Aviate Metrics
The Aviate plugin provides a Retrieve Metrics endpoint that can be used to assess the health of the system.
This section provides some insights into the metrics that can be retrieved via this endpoint.
Metrics Overview
The metrics exposed by the Aviate plugin can mainly be categorized in the following event groups (types):
-
Gauge - A gauge return a single numerical value per minute/hour/day (based on the value of the
granularity
parameter) -
Meter - A meter provide the rate over time. They provide different data points for the following sample kinds:
Sample Kind Description count
Monotonic increasing value since last reboot.
{one_minute/five_minute/fifteen_minute}_rate
Rate through a window of time.
mean_rate
Mean rate since last reboot.
-
Timer - A timer measures the rate of events and the duration of those events. They provide different data points for the following sample kinds:
Sample Kind Description mean_rate
Mean rate since last reboot.
{one_minute/five_minute/fifteen_minute}_rate
Rate through a window of time.
tp99, tp999, tp75, tp98, tp95
Percentile for the metrics since last reboot.
min
Min value since last reboot.
max
Max value since last reboot.
count
Monotonic increasing value since last reboot.
median
Median value since last reboot.
std_dev
Standard deviation since last reboot.
Queue Metrics
Queue metrics can be used to assess the health of the Kill Bill internal queues.
Kill Bill has its own internal queues used to dispatch events. Events that are dispatched right away as a result of some internal state being created or updated are called bus events - e.g. a subscription_creation event is generated as a result of creating a new subscription. Events that are scheduled to be dispatched in the future are called notifications - e.g. invoice scheduled on a periodic basis matching account settings and plan billing periods. The health of these internal queues is critical to maintaining correct functioning of the system.
Note that the metrics associated with the queues are global (as opposed to computed per node) so the nodeName
query parameter will be ignored. Additionally, all the queue metrics are of Gauge
and therefore return a single value.
The following table lists these metrics:
Metric Name | Description |
---|---|
queue.bus.late |
This is a counter that shows how many unprocessed bus events we have at time 't'. This number should be close to 0. |
queue.bus.incoming |
The is rate of incoming bus events at time t. The default granularity is |
queue.bus.processing |
This is an estimation of the time in mSec that was used to process the bus event. These values may become incorrect once we have late bus entries. |
queue.notifications.late |
This is a counter that shows how many unprocessed notifications we have at time 't'. This number should be close to 0. |
queue.notifications.incoming |
The is rate of incoming notifications at time t. The default granularity is |
queue.notifications.processing |
This is an estimation of the time in mSec that was used to process the notification. These values may become incorrect once we have late notifications. |
Logs
Kill Bill is configured to output its internal logs as specified by the logback.xml
configuration (See docs). The aviate plugin running on each node extracts important information from the logs and computes some metrics to highlight potential issues with warn
and error
logs that have happened through time.
Those metrics are computed per node. Additionally, the log metrics are all Meter
metrics, and so they each provide different time series as specified by the Sample Kind
listed above.
The following table lists these metrics:
Metric Name | Description |
---|---|
logs.rates.warn |
Represents warnings in logs |
logs.rates.error |
Represents errors in logs |
Servlet Responses
Servlet metrics provide visibility into any of the endpoints exposed by the system, either from Kill Bill core (/1.0/kb
) or any plugins exposing endpoints.
These metrics are computed per node. The servlet metrics are Meter
metrics, so they each provide different time series as specified by the Sample Kind
listed above.
The following table lists these metrics:
Metric Name | Description |
---|---|
servlets.responses.ok |
Represents successful responses. |
servlets.responses.created |
Represents created responses. |
servlets.responses.badRequest |
Represents bad request responses. |
servlets.responses.noContent |
Represents no content responses. |
servlets.responses.notFound |
Represents not found responses. |
servlets.responses.serverError |
Represents server error responses. |
servlets.responses.other |
Represents other responses. |
Database Connection Pools
Kill Bill uses 3 different database connection pools: main
, shiro
, and osgi
. main
and shiro
are internal connection pools within Kill Bill core. The osgi
connection pool is used by the plugins running on top of the Kill Bill platform for any database calls.
The connection pool metrics are computed per node. The following metrics are Gauge
metrics and so they return a single value:
Metric Name | Description |
---|---|
main.pool.TotalConnections |
Total (created) connections at time 't' for |
main.pool.ActiveConnections |
Active (in use) connections at time 't' for |
main.pool.IdleConnections |
Idle connections at time 't' for |
osgi.pool.TotalConnections |
Total (created) connections at time 't' for |
osgi.pool.ActiveConnections |
Active (in use) connections at time 't' for |
osgi.pool.IdleConnections |
Idle connections at time 't' for |
shiro.pool.TotalConnections |
Total (created) connections at time 't' for |
shiro.pool.ActiveConnections |
Active (in use) connections at time 't' for |
shiro.pool.IdleConnections |
Idle connections at time 't' for |
The following metrics are timer metrics and provide different data points for the sample kinds listed above:
Metric Name | Description |
---|---|
main.pool.Wait |
Wait time to get a connection from the pool. |
osgi.pool.Wait |
Wait time to get a connection from the pool. |
shiro.pool.Wait |
Wait time to get a connection from the pool. |
Retrieve Diagnostic Report Examples
The Retrieve Diagnostic Report endpoint generates a diagnostic report. Such a report, when shared with the Kill Bill team, allows the team to replicate a user’s Kill Bill environment and diagnose issues.
This section provides examples of generating the diagnostic report with various parameters.
With Account Data Only
In order to include the account data, the accountId
parameter needs to be specified:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8080/plugins/aviate-plugin/v1/health/diagnostic?accountId=88edb5ba-d613-4923-84dc-b3cee1bf0b42" -JO
This command generates a diagnostic file that includes a file called account.data
. This has data from all the KB tables for the specified accountId.
With Configuration Data
The configuration data is controlled via the withKillbillConfig
, withTenantConfig
and the withSystemConfig
query parameters:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8080/plugins/aviate-plugin/v1/health/diagnostic?accountId=d0b3e0b9-f4bc-43d4-8001-1969b03b4555&withKillbillConfig=true&withTenantConfig=true" \
-JO
This command generates a diagnostic file with the killbill_configuration.data
(which includes the global KB configuration), and tenant_config.data
(which includes the per-tenant configuration properties like catalog, overdue, etc.) files.
With KB Logs
In order to include the logs in the diagnostic file, the following parameters need to be specified:
-
withLogs=true
-
logsDir=path of the directory from which to include the log files. If unspecified, this defaults to the value of
com.killbill.billing.plugin.aviate.logsPath
property. If this property is not specified, it defaults toTOMCAT_HOME/logs
whereTOMCAT_HOME
is determined via thecatalina.base
system property -
logsFilenames=name of the log file that needs to be included. If multiple log files are required, this parameter needs to be included multiple times corresponding to each log file as shown in the examples below. If unspecified, this defaults to
catalina.out
.
Depending on the type of KB installation, you can use different curl
commands to include the KB logs.
Here are some examples.
Tomcat Setup
Example 1
On a Tomcat setup, the KB/Kaui logs are created in the directory specified in the logback.xml
as explained here. Thus, to include these logs, you need to specify the logsDir
parameter with the path specified in logback.xml
as follows:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8080/plugins/aviate-plugin/v1/health/diagnostic?accountId=88edb5ba-d613-4923-84dc-b3cee1bf0b42&withLogs=true&logsDir=/var/lib/killbill/&logsFilenames=killbill.out&logsFilenames=kaui.out" -JO
This command generates a diagnostic file with the account data and the killbill.out
, kaui.out
log files from the /var/lib/killbill
directory.
Example 2:
As explained earlier, if the logsDir
parameter is not specified, it defaults to the value of the com.killbill.billing.plugin.aviate.logsPath
property. Assuming that the KB log files are created in the /var/lib/killbill/logs
directory and that the com.killbill.billing.plugin.aviate.logsPath=/var/lib/killbill/logs
property is set, you can use the following command to include the KB/Kaui log files:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8080/plugins/aviate-plugin/v1/health/diagnostic?accountId=88edb5ba-d613-4923-84dc-b3cee1bf0b42&withLogs=true&logsFilenames=killbill.out&logsFilenames=kaui.out" -JO
This command generates a diagnostic file with the account data and the killbill.out
, kaui.out
log files from the /var/lib/killbill/logs
directory.
Example 3
If the com.killbill.billing.plugin.aviate.logsPath
property is not set and the logsDir
parameter is not specified, it defaults to the TOMCAT_HOME/logs
directory. Thus, the localhost.2025-01-01.log
file from the Tomcat logs directory can be included using the following command:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8080/plugins/aviate-plugin/v1/health/diagnostic?accountId=88edb5ba-d613-4923-84dc-b3cee1bf0b42&withLogs=true&logsFilenames=localhost.2025-01-01.log" -JO
This command generates a diagnostic file with the account data and the localhost.2025-01-01.log
file from the Tomcat home directory.
Example 4
As mentioned earlier, if the logsFilenames
parameter is not specified, it defaults to catalina.out
. Thus, to include this file, you can use:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8080/plugins/aviate-plugin/v1/health/diagnostic?accountId=88edb5ba-d613-4923-84dc-b3cee1bf0b42&withLogs=true" -JO
This command generates a diagnostic file with the account data and the catalina.out
file from the Tomcat home directory.
Docker/AWS Setup
In case of a Docker/AWS setup, KB/Kaui logs are created in the $TOMCAT_HOME/logs
directory (which is /var/lib/tomcat/logs
). Thus, there is no need to explicitly specify the logsDir
parameter to include these logs. So you can use:
curl -X GET \
-H 'X-killbill-apiKey: bob' \
-H 'X-killbill-apisecret: lazar' \
-H "Accept: application/zip" \
-H "Authorization: Bearer ${ID_TOKEN}" \
"http://127.0.0.1:8081/plugins/aviate-plugin/v1/health/diagnostic?accountId=0dc4f698-c6cd-49c6-90e7-f745c70f96cb&withLogs=true&logsFilenames=killbill.out" -JO
This command generates a diagnostic file with the account data and the killbill.out
log file from the /var/lib/tomcat/logs
directory.