Healthchecks and metrics

Kill Bill exposes both healthchecks and metrics. These can be accessed via:

curl -v http://<IP_ADDRESS>:8080/1.0/healthcheck?pretty=true
curl -v http://<IP_ADDRESS>:8080/1.0/metrics?pretty=true

Remote debugging

Specify -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=12345 on startup in your JVM settings to enable remote debugging (this is set by default in our Docker images). You can then connect to Kill Bill on port 12345 from your debugger (e.g. Eclipse or IntelliJ).

If Kill Bill is running locally, you can just specify localhost or 127.0.0.1 as the remote host in your debugger. To debug instances remotely (e.g. in the cloud), you need to specify the publicly accessible IP address of your instance. For Docker, this should be the public IP address of the container. If you are using docker-machine and your machine is named default, you can retrieve the IP address via docker-machine inspect default -f '{{.Driver.IPAddress}}'. Passing this IP address to our Docker images as ENV_HOST_IP enables remote debugging automatically.

Make sure also that the port 12345 is open (e.g. check your security groups in AWS). For Docker, expose the port via -p12345:12345.

Turning on DEBUG logging

Specify:

-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=8000
-Dcom.sun.management.jmxremote.rmi.port=8000
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=localhost

on startup in your JVM settings to enable access via jconsole.

You can set at runtime the log level of specific classes and packages (look for the ch.qos.logback.classic:Name=default,Type=ch.qos.logback.classic.jmx.JMXConfigurator MBean).

Similar considerations as the remote debugging section apply regarding the IP address and expose ports (see above). For Docker, java.rmi.server.hostname should point to the host IP address (e.g. 192.168.99.100). Passing this IP address to our Docker images as ENV_HOST_IP enables jConsole automatically.

Following tokens

To understand why a specific action was taken, it is often useful to follow tokens.

When invoking a JAX-RS (HTTP) API, a userToken is automatically generated. You can also specify your own token via the X-Request-Id HTTP header (it must be a UUID).

Once set, this token is:

  • attached to the processing thread

  • attached to SLF4J MDC context as kb.userToken and displayed by default in logs (tok=)

  • passed to all APIs as CallContext#getUserToken()

  • passed to internal bus events as BusEvent#getUserToken()

  • passed to external bus events as ExtBusEvent#getUserToken()

  • passed to future notifications as userToken

Note that when a notification is created, a futureUserToken is auto-generated. That futureUserToken becomes the userToken when the notification is dispatched.

You can search for these tokens in Kaui, on the Queues screen, accessible from the tenant page or via the per-account sub-menu (you need to be logged-in as the Kaui root user).

Testing changes over time

We have a clock abstraction in Kill Bill that can be manipulated through an API, as long as you start Kill Bill with org.killbill.server.test.mode=true (-e KILLBILL_SERVER_TEST_MODE=true in Docker):

curl -v \
     -u admin:password \
     -H "X-Killbill-ApiKey: bob" \
     -H 'X-Killbill-ApiSecret: lazar' \
     -H 'X-Killbill-CreatedBy: admin' \
     "http://127.0.0.1:8080/1.0/kb/test/clock"

curl -v \
     -u admin:password \
     -H "X-Killbill-ApiKey: bob" \
     -H 'X-Killbill-ApiSecret: lazar' \
     -H 'X-Killbill-CreatedBy: admin' \
     -H "Accept: application/json" \
     -X POST \
     "http://127.0.0.1:8080/1.0/kb/test/clock?requestedDate=2015-12-14T23:02:15.000Z"

curl -v \
     -u admin:password \
     -H "X-Killbill-ApiKey: bob" \
     -H 'X-Killbill-ApiSecret: lazar' \
     -H 'X-Killbill-CreatedBy: admin' \
     -H "Accept: application/json" \
     -X PUT \
     "http://127.0.0.1:8080/1.0/kb/test/clock?days=10"

Here is an example how it could be used in tests.

Docker and Ansible

Our Docker images use Ansible behind the scenes to install and configure Kill Bill in the container. The actual Ansible commands run are available in the images via the environment variables $KPM_INSTALL_CMD and $START_TOMCAT_CMD.

If you need to debug this setup process, you can start a container via docker run -ti killbill/killbill:0.20.0 bash and execute these commands manually.

Performance and profiling

Kill Bill has some built-in support to profile incoming calls. The mechanism is based on HTTP headers that need to be set to specify what needs to be profiled, and as a result, a new HTTP header is returned.

Types of profiling

In order to return profiling information, one needs to add the HTTP header X-Killbill-Profiling-Req and specify which calls should be instrumented. Today, we support:

  • JAXRS: The HTTP call will be returned with the time spent in the call.

  • API: Any Kill Bill API call will be returned with the time spent in the call

  • DAO: Any Kill Bill DAO (request to database) call will be returned with the time spent in the call

  • DAO_DETAILS: Any Kill Bill DAO (request to database) call along with specifics about time time spent for audit/history and caching will be returned with the time spent in the call

  • DAO_CONNECTION: The time spent trying to acquire a database connection

  • GLOCK: The time spent trying to acquire the global (Account) lock

  • PLUGIN: Any Kill Bill call spent when calling a plugin will be returned with the time spent in the call

JAXRS Profiling

The time is extracted from a Jersey filter, so it should be very close to the time spent into the JAX-RS application handler, not taking into account the time spent in the container itself (i.e. Jetty, Tomcat, …​).

API Profiling

We are relying on AOP to profile each of our API calls. We are injecting a Guice AOP module to profile each of our API calls. The time returned does not include the extra time spent to dispatch the request using the AOP handler.

In addition, there is another AOP module that is used to validate the user has the right set of permissions to execute this call. Since this occurs prior we do the profiling this time is currently not included in the profiling of the API call.

Finally when making API calls from plugins, there may be also a small overhead to go through OSGI layer and this time is also not included.

DOA Profiling

Most of the requests issued from Kill Bill Dao (e.g DefaultPaymentDao) classes follow the same mechanism:

  1. A transaction is started by calling the execute method of the EntitySqlDaoTransactionalJdbiWrapper. Here, we create a Handle, essentially retrieving a database connection. The time taken to get the connection can be profiled by using the DAO_CONNECTION mask.

  2. Then, a jdbi transaction is started using cglib, and this transaction is actually run through a series of handlers, such as our RestartTransactionRunner. This extra time, is not being measured by our profiling layer.

  3. Finally, for each SQL operation within the transaction, we also proxy the calls using the java Proxy mechanism, and so all those calls are funneled trough the associated EntitySqlDaoWrapperInvocationHandler, and this is where the timing associated to the DAO (and DAO_DETAILS) mask is taken.

Note that the profiling time reported by the API mask — since it supersedes all dao calls — will include everything described above. Therefore it is possible to see an API that would only make a dao call take slightly longer than the dao call itself.

The DAO mask will take the total time it takes to run the invocation handler invoke method. Because there are several paths, we also include the option to return DAO_DETAILS:

  • In most cases, we will have a profiling entry DAO_DETAILS:<method> (raw) that only profiles the call requested.

  • In the case of an insert, update, deletion, the code will also update audit logs, and history table. In this case, we will also see a profiling entry DAO_DETAILS:<method> (history/audit) that will return the time it took to create the additional records in these tables.

  • In the case of a cacheable query (e.g return a record_id associated to an object uuid), we may return the result from the cache and so this is the only case where DAO_DETAILS:<method>:raw will not be printed.

  • In all other cases, the DAO_DETAILS:<method> (raw) should match the DAO:<method> time.

GLOCK Profiling

The GLOCK mask will simply output the time it took to grab the database lock associated with a given Account — the only kind of global lock currently supported in Kill Bill.

PLUGIN Profiling

This mask can be used to profile the time it takes for Kill Bill to make calls to a given plugin using a given plugin api. Here again, such plugin calls are being proxied using the Proxy mechanism, and the timing is taken from the invocation handler.

The information will contain the name of the plugin api class and the method being invoked.

Example

Kill Bill will return a json object containing the hierarchy for the calls along with the time spent in uSec (1000 nanoseconds or 0.000001 seconds).

Example (Note the piping to extract the response header, and format the json nicely):

curl -v \
     -u admin:password \
     -H "Accept: application/json" \
     -H "X-Killbill-ApiKey: bob" \
     -H "X-Killbill-ApiSecret: lazar" \
     -H 'X-Killbill-Profiling-Req: JAXRS,API,DAO,DAO_DETAILS,DAO_CONNECTION,GLOCK' \
     "http://127.0.0.1:8080/1.0/kb/paymentMethods/3dd4e9d3-2be3-4bf2-b1e4-64a1785afd53" 2>&1 \
     | grep 'X-Killbill-Profiling-Resp' \
     | awk '{$1=$2=""; print $0}' \
     | jq '.'

Response:

{
  "rawData": [
    {
      "name": "JAXRS:1.0/kb/paymentMethods/3dd4e9d3-2be3-4bf2-b1e4-64a1785afd53",
      "durationUsec": 7654,
      "calls": [
        {
          "name": "API:getPaymentMethodById",
          "durationUsec": 4158,
          "calls": [
            {
              "name": "DAO_CONNECTION:get",
              "durationUsec": 60
            },
            {
              "name": "DAO:PaymentMethodSqlDao: getById",
              "durationUsec": 1371,
              "calls": [
                {
                  "name": "DAO_DETAILS:PaymentMethodSqlDao (raw): getById",
                  "durationUsec": 1364
                }
              ]
            }
          ]
        },
        {
          "name": "API:getAccountById",
          "durationUsec": 2593,
          "calls": [
            {
              "name": "API:getAccountById",
              "durationUsec": 2462,
              "calls": [
                {
                  "name": "API:getAccountByRecordIdInternal",
                  "durationUsec": 2408,
                  "calls": [
                    {
                      "name": "DAO_CONNECTION:get",
                      "durationUsec": 40
                    },
                    {
                      "name": "DAO:AccountSqlDao: getByRecordId",
                      "durationUsec": 877,
                      "calls": [
                        {
                          "name": "DAO_DETAILS:AccountSqlDao (raw): getByRecordId",
                          "durationUsec": 872
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        },
        {
          "name": "API:getAccountAuditLogs",
          "durationUsec": 6
        }
      ]
    }
  ]
}

We can see that for each mask there is some profiling data and such data retains the struture of the calls being made.

Client Side

The Ruby client library (https://github.com/killbill/killbill-client-ruby) has been extended to support profiling data. By passing a additional options the library will send the correct header to Kill Bill and capture the resulting information.

Currently, only JAXRS profilingData will be requested and returned for simplification.

In the example below per_thread_profiling_data will contain the timing for the various calls, which in that scenario would be post:/1.0/kb/payments/uuid and get:/1.0/kb/payments/uuid:. The provided hash will contain one key per call and the values will be an array of timing data expressed in uSec:

# Add the :profilingData option
per_thread_profiling_data = {}
options = {:username => 'admin', :password => 'password', :profilingData => per_thread_profiling_data}
res = auth_capture_task.op_create_capture(payment_id, ext_key, amount, currency, username, options)

Note that Ruby client gem will not make any attempt to synchronize access the the profiling_data map, when multiple threads are making calls. It is recommended to use per-thread data and potentially merge the results from the different threads at the end.

Tips and tricks

  • In most cases, the client will be a bottleneck. Make sure to watch the number of sockets in TIME_WAIT (watch 'netstat -an | grep 8080 | grep WAIT | wc -l') and speed up the recycling process:

echo 2 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
  • On the server side, check the expected concurrency level by watching the number of sockets in ESTABLISHED (watch 'netstat -an | grep 8080 | grep EST | wc -l')

  • Make sure to allocate enough database (org.killbill.dao.maxActive / org.killbill.billing.osgi.dao.maxActive) and plugin (org.killbill.payment.plugin.threads.nb) threads. Check your container thread pool too (e.g. conf/server.xml for Tomcat)

  • Use tools like Siege (http://www.joedog.org/siege-home/) to verify your basic setup: siege -b -t30S -c100 http://127.0.0.1:8080/1.0/kb/test/clock should yield at least 5k req./s.

  • If Shiro is spending too many CPU cycles for authentication, lower the default number of iterations (e.g. org.killbill.server.multitenant.hash_iterations=2000).

  • When using YourKit, turn off probes (especially the database ones). They cause a significant slowdown.

  • Allow a warm-up period, before starting a full test, to avoid contention in the JRuby JIT.

  • Use mjprof to extract stacktraces:

java -jar target/mjprof-1.0.jar jmx/127.0.0.1:8989/.ncontains/name,RMI TCP/.ncontains/name,RMI Reaper/.ncontains/name,RMI RenewClean/.ncontains/name,RMI Scheduler/.ncontains/name,jruby-restarter/.ncontains/name,com.google.inject.internal.util.$Finalizer/.ncontains/name,Finalizer/.ncontains/name,Reference/.ncontains/name,FelixStartLevel/.ncontains/name,FelixDispatchQueue/.ncontains/name,http-nio-8080/.ncontains/name,Abandoned connection cleanup thread/.ncontains/name,CM Event Dispatcher/.ncontains/name,CM Configuration Updater/.ncontains/name,SCR Component Actor/.ncontains/name,Timer-/.ncontains/name,telnetconsole.Listener/.ncontains/name,O worker /.ncontains/name,O boss /.ncontains/name,NioBlockingSelector/.ncontains/name,Signal Dispatcher/.ncontains/name,main/.ncontains/name,JMX server connection timeout/.ncontains/state,WAITING/.sort/state/

Load Tests

Integration Tests

The integration tests repo allows to run load tests against a running instance of Kill Bill. The README has a section explaining how they work.

Note: It is difficult to rely on single process MT threaded Ruby script to generate enough load.

GNU parallel: simple load testing script

Another way to generate load is to rely on GNU parallel:

seq 0 500 | parallel -j10 --no-notice -u ./perf_test.sh

Where perf_test.sh is:

iteration_id=$1

echo "===> Starting $iteration_id"

account_id=$(curl -v \
                  -X POST \
                  -u admin:password \
                  -H 'Content-Type: application/json' \
                  -H 'X-Killbill-ApiKey: bob' \
                  -H 'X-Killbill-ApiSecret: lazar' \
                  -H 'X-Killbill-CreatedBy: creator' \
                  --data-binary "{\"name\":\"john\",\"email\":\"[email protected]\",\"externalKey\":\"perf-$RANDOM-$RANDOM-$RANDOM\",\"currency\":\"USD\"}" \
                  http://127.0.0.1:8080/1.0/kb/accounts 2>&1 | grep Location | awk '{print $3}' | awk -F'/' '{print $7}')

account_id=$(tr -dc '[[:print:]]' <<< "$account_id")

curl  \
     -X POST \
     -u admin:password \
     -H 'Content-Type: application/json' \
     -H 'X-Killbill-ApiKey: bob' \
     -H 'X-Killbill-ApiSecret: lazar' \
     -H 'X-Killbill-CreatedBy: creator' \
     --data-binary '{"pluginName":"YOUR-PLUGIN","pluginInfo":{"properties":[{"key":"type","value":"CreditCard","isUpdatable":false},{"key":"ccType","value":"visa","isUpdatable":false},{"key":"ccName","value":"A Smith","isUpdatable":false},{"key":"email","value":"[email protected]","isUpdatable":false},{"key":"ccExpirationMonth","value":"03","isUpdatable":false},{"key":"ccExpirationYear","value":"2016","isUpdatable":false},{"key":"ccVerificationValue","value":"222","isUpdatable":false},{"key":"address1","value":"lskdjf","isUpdatable":false},{"key":"address2","value":"","isUpdatable":false},{"key":"city","value":"sdfsdfsff","isUpdatable":false},{"key":"ccFirstName","value":"sdfsdf","isUpdatable":false},{"key":"ccLastName","value":"fdsfdsf","isUpdatable":false},{"key":"zip","value":"23812","isUpdatable":false},{"key":"country","value":"USA","isUpdatable":false},{"key":"state","value":"CA","isUpdatable":false},{"key":"ccNumber","value":"4111111111111111","isUpdatable":false}]}}' \
     "http://127.0.0.1:8080/1.0/kb/accounts/${account_id}/paymentMethods?isDefault=true" > /dev/null 2>&1


curl  \
     -X POST \
     -u admin:password \
     -H 'Content-Type: application/json' \
     -H 'X-Killbill-ApiKey: bob' \
     -H 'X-Killbill-ApiSecret: lazar' \
     -H 'X-Killbill-CreatedBy: creator' \
     --data-binary '{"transactionType":"AUTHORIZE","amount":"10","currency":"USD"}' \
     "http://127.0.0.1:8080/1.0/kb/accounts/${account_id}/payments" > /dev/null 2>&1

echo "===> Finished $iteration_id"

Note that on a single machine, this client-side script will most likely be the bottleneck (spawning the processes takes too much time). Use JMeter instead.

Ruby plugins debugging

Using RubyMine

  • Add the following dependencies to your Gemfile:

gem 'ruby-debug', '~> 0.10'
gem 'ruby-debug-ide', '~> 0.6'
  • Add the following snippet at the entry point of your plugin, e.g. top of lib/cybersource.rb for CyberSource:

unless defined?(Debugger::PROG_SCRIPT)
  JRuby.objectspace = true

  require 'ruby-debug-ide'
  Debugger::PROG_SCRIPT = __FILE__

  require 'ostruct'
  options = OpenStruct.new(
      # Host name used for remote debugging
      'host' => nil,
      # Port used for remote debugging
      'port' => 1234,
      # Port used for multi-process debugging dispatcher
      'dispatcher_port' => -1,
      # Load mode (experimental)
      'load_mode' => false,
      # Stop when the script is loaded
      'stop' => false,
      # Keep frame bindings
      'frame_bind' => true,
      # Turn on line tracing
      'tracing' => false,
      # Disables interrupt signal handler
      'int_handler' => true,
      # Evaluation timeout in seconds
      'evaluation_timeout' => 20,
      # Enable all RubyMine-specific incompatible protocol extensions
      'rm_protocol_extensions' => true,
      # Enable chatchpointDeleted event
      'catchpoint_deleted_event' => false,
      # Allow to pass variable's value as nested element instead of attribute
      'value_as_nested_element' => true
  )

  Debugger.keep_frame_binding = options.frame_bind
  Debugger.tracing = options.tracing
  Debugger.evaluation_timeout = options.evaluation_timeout
  Debugger.catchpoint_deleted_event = options.catchpoint_deleted_event || options.rm_protocol_extensions
  Debugger.value_as_nested_element = options.value_as_nested_element || options.rm_protocol_extensions

  Debugger.debug_program(options)

  exit
end
  • Rebuild and redeploy the plugin:

bundle install && \
rake killbill:clean && \
rake build && \
rake killbill:package && \
rake killbill:deploy[true]
  • Create a Ruby remote debug configuration in RubyMine:

  • Remote host: localhost

  • Remote port: 1234

  • Remote root folder: /var/tmp/bundles/plugins/ruby/\<plugin\>/\<version\>/ROOT

  • Local port: 26162

  • Local root folder: path to your code

  • Restart Kill Bill with the Java property -X+O

The Kill Bill startup sequence will stop at:

Fast Debugger (ruby-debug-ide 0.6.0, ruby-debug-base 0.10.6, file filtering is supported) listens on 127.0.0.1:1234

until RubyMine is connected.

Using ruby-debug

While RubyMine offers a nice GUI, it can quickly lose its context or have a hard time stepping through gems dependencies. ruby-debug is much more reliable.

  • Add the following dependency to your Gemfile:

gem 'ruby-debug', '~> 0.10'
  • Add the following snippet at the entry point of your plugin, e.g. top of lib/cybersource.rb for CyberSource:

require 'ruby-debug'
Debugger.wait_connection = true
Debugger.start_remote(nil, 1234)
  • Set a breakpoint in the source code using debugger

  • Rebuild and redeploy the plugin:

bundle install && \
rake killbill:clean && \
rake build && \
rake killbill:package && \
rake killbill:deploy[true]
  • Restart Kill Bill with the Java property -X+O

The Kill Bill startup sequence will stop until ruby-debug is connected. To launch it:

rdebug --client -p 1234

When the breakpoint is hit, you can examine the context and step into the code:

lib/cybersource/api.rb:164
properties = merge_properties(properties, options)
(rdb:9) l
[159, 168] in lib/cybersource/api.rb
   159          # Pass extra parameters for the gateway here
   160          options = {}
   161
   162          debugger
   163
=> 164          properties = merge_properties(properties, options)
   165          super(kb_account_id, kb_payment_method_id, payment_method_props, set_default, properties, context)
   166        end
   167
   168        def delete_payment_method(kb_account_id, kb_payment_method_id, properties, context)
(rdb:9) s
/var/tmp/bundles/plugins/ruby/killbill-cybersource/3.0.0/ROOT/gems/gems/killbill-5.2.0/lib/killbill/helpers/properties_helper.rb:24
merged = properties_to_hash(properties, options)
(rdb:9) l
[19, 28] in /var/tmp/bundles/plugins/ruby/killbill-cybersource/3.0.0/ROOT/gems/gems/killbill-5.2.0/lib/killbill/helpers/properties_helper.rb
   19          end
   20          merged.merge(options)
   21        end
   22
   23        def merge_properties(properties, options = {})
=> 24          merged = properties_to_hash(properties, options)
   25
   26          properties = []
   27          merged.each do |k, v|
   28            properties << build_property(k, v)

Seeking help

If all else fail, reach out to our mailing-list for help (do not open a Github issue).

In your message, specify:

  • What you are seeing and what you are expecting

  • How to reproduce your scenario (cURL commands, code snippet, …​)

  • KPM diagnostic output, e.g.

kpm diagnostic --killbill-api-credentials=bob lazar \
               --killbill-credentials=admin password \
               --killbill-url=http://127.0.0.1:8080 \
               --killbill-web-path=/var/lib/tomcat7/webapps/ROOT/ \
               --kaui-web-path=/var/lib/tomcat7/webapps/ROOT/ \
               --log-dir=/var/lib/tomcat7/logs \
               --account_export=ACCOUNT_ID