User Tools

Site Tools


documentation:software:whiterabbit:test_framework

This is an old revision of the document!


Rabbit-In-a-Hat testing framework

Rabbit-In-a-Hat can generate a framework for creating a set of unit tests. The framework consists of a set of R functions tailored to the source and target schema in your ETL. These functions can then be used to define the unit tests.

Overview

These are the steps to perform unit testing:

  1. Create the testing framework for your source and target database schemas.
  2. Using the framework in R, define a set of unit tests
  3. Use the framework to generate testing data in the source data schema
  4. Run your ETL on the test data to produce data in the CDM schema
  5. Use the framework to evaluate whether the CDM data meets the defined expectations

It is advised to use R-Studio for defining your unit tests. One reason is that RStudio will automatically prompt you with possible function and argument names after you've only typed the first few characters.

Creating the testing framework

In Rabbit-in-a-Hat, have your ETL specifications open. The source data schema should be loaded from the White-Rabbit scan report, and the target data schema should be selected (usually the OMOP CDM v5). Go to File → Generate ETL Test Framework, and use a file name with the .R extension, for example MyTestFrameWork.R.

Defining unit tests using the framework

Next, create an empty R script, and start by sourcing the R file that was just created:

source("MyTestFrameWork.R")

Be sure to run this command immediately to make the function definitions available to R-Studio.

Available functions

The test framework defines the following functions for each table in the source schema:

  • get_defaults_<table name> shows the default field values that will be used when creating a record in the table. At the start, these default values have been taken from the White-Rabbit scan report, using the most frequent value.
  • set_defaults_<table name> can be used to change the default values of one or more fields in the table. For example set_defaults_enrollment(enrollment_date = "2000-01-01").
  • add_<table name> can be used to specify that a record should be created in the table. The arguments can be used to specify field values. For fields where the user doesn't specify a value, the default value is used. For example add_enrollment(member_id = "M00000001").

The following functions are defined for each table in the CDM schema:

  • expect_<table name> can be used to state the expectation that at least one record with the defined properties should exist in the table. For example expect_person(person_id = 1, person_source_value = "M00000001").
  • expect_no_<table name> can be used to state the expectation that no record with the defined properties should exist in the table. For example expect_no_condition_occurrence(person_id = 1).
  • expect_count_<table name> can be used to state the expectation that a specific number of records with the defined properties should exist in the table. For example expect_count_condition_occurrence(person_id = 1, rowCount = 3).

One further function is available:

  • define_test is used to group multiple statements under a single identifier. For example define_test(id = 1, description = "Test person ID").

Defining unit tests

Using these functions, we can define tests. Here is an example unit test:

declareTest(101, "Person id")
add_core(key = "4200000000101")
expect_person(person_id = personIdPlusPlus(), person_source_value = "4200000000101")

For an example set of test definitions, see the HCUP ETL unit tests

documentation/software/whiterabbit/test_framework.1464866167.txt.gz · Last modified: 2016/06/02 11:16 by schuemie