This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
documentation:software:whiterabbit:test_framework [2016/06/02 11:16] schuemie |
documentation:software:whiterabbit:test_framework [2020/02/18 14:13] (current) maximmoinat |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Rabbit-In-a-Hat testing framework ====== | ====== Rabbit-In-a-Hat testing framework ====== | ||
+ | |||
+ | **NOTICE FEB 2020:** the current documentation can be found on Github | ||
+ | http://ohdsi.github.io/WhiteRabbit/riah_test_framework.html | ||
+ | |||
Rabbit-In-a-Hat can generate a framework for creating a set of [[https://en.wikipedia.org/wiki/Unit_testing|unit tests]]. The framework consists of a set of R functions tailored to the source and target schema in your ETL. These functions can then be used to define the unit tests. | Rabbit-In-a-Hat can generate a framework for creating a set of [[https://en.wikipedia.org/wiki/Unit_testing|unit tests]]. The framework consists of a set of R functions tailored to the source and target schema in your ETL. These functions can then be used to define the unit tests. | ||
+ | |||
+ | Unit testing assumes that you have your data in source format somewhere in a database. You should already have created an ETL process that will extract from the source database, transform it into CDM format, and load it into a CDM schema. The unit test framework can be used to make sure that your ETL process is doing what it is supposed to do. For this you will need to create a new, empty database with exactly the same structure as your source database, and a new empty database where a test CDM database will live. The framework can be used to insert test data into the empty source schema. You can then run your ETL process on the test data to populate the test CDM database. you can then use the framework to verify that the output of the ETL in the test CDM database is what you'd expect given the test source data. | ||
===== Overview ===== | ===== Overview ===== | ||
Line 49: | Line 55: | ||
One further function is available: | One further function is available: | ||
- | * ''%%define_test%%'' is used to group multiple statements under a single identifier. For example ''%%define_test(id = 1, description = "Test person ID")%%''. | + | * ''%%declareTest%%'' is used to group multiple statements under a single identifier. For example ''%%declareTest(id = 1, description = "Test person ID")%%''. |
==== Defining unit tests ==== | ==== Defining unit tests ==== | ||
Line 56: | Line 62: | ||
<code> | <code> | ||
- | declareTest(101, "Person id") | + | declareTest(101, "Person gender mappings") |
- | add_core(key = "4200000000101") | + | add_enrollment(member_id = "M000000101", gender_of_member = "male") |
- | expect_person(person_id = personIdPlusPlus(), person_source_value = "4200000000101") | + | add_enrollment(member_id = "M000000102", gender_of_member = "female") |
+ | expect_person(person_id = 101, gender_concept_id = 8507, gender_source_value = "male") | ||
+ | expect_person(person_id = 102, gender_concept_id = 8532, gender_source_value = "female") | ||
+ | </code> | ||
+ | |||
+ | In this example, we define a test for gender mappings. We specify that two records should be created in the ''%%enrollment%%'' table in the source schema, and we specify different values for the ''%%member_id%%'' field and ''%%gender_of_member%%'' field. Note that the ''%%enrollment%%'' table might have many other fields, for example defining the start and end of enrollment, but that we don't have to specify these in this example because these fields will take their default values, typically taken from the White-Rabbit scan report. | ||
+ | |||
+ | In this example we furthermore describe what we expect to see in the CDM data schema. In this case we formulate expectations for the ''%%person%%'' table. | ||
+ | |||
+ | We can add many such tests to our R script. For an example of a full set of test definitions, see the [[https://github.com/OHDSI/JCdmBuilder/blob/master/tests/HCUPETLToV5/HcupTests.R|HCUP ETL unit tests]]. | ||
+ | |||
+ | ==== Generate test data in the source data schema ==== | ||
+ | |||
+ | After we have defined all our tests we need to run | ||
+ | <code> | ||
+ | insertSql <- generateInsertSql(databaseSchema = "nativeTestSchema") | ||
+ | testSql <- generateTestSql(databaseSchema = "cdmTestSchema") | ||
+ | </code> | ||
+ | to generate the SQL for inserting the test data in the database (insertSql), and for running the tests on the ETL-ed data (testSql). The insertion SQL assumes that the data schema already exists in ''nativeTestSchema'', and will first remove any records that might be in the tables. We can execute the SQL in any SQL client, or we can use OHDSI's [[https://github.com/OHDSI/DatabaseConnector|DatabaseConnector package]]. For example: | ||
+ | |||
+ | <code> | ||
+ | library(DatabaseConnector) | ||
+ | connectionDetails <- createConnectionDetails(user = "joe", | ||
+ | password = "secret", | ||
+ | dbms = "sql server", | ||
+ | server = "my_server.domain.org") | ||
+ | connection <- connect(connectionDetails) | ||
+ | |||
+ | executeSql(connection, paste(insertSql, collapse = "\n")) | ||
+ | </code> | ||
+ | |||
+ | ==== Run your ETL on the test data ==== | ||
+ | |||
+ | Now that the test source data is populated. You can run the ETL process you would like to test. The ETL should transform the data in ''nativeTestSchema'' to CDM data in ''cdmTestSchema''. | ||
+ | |||
+ | ==== Test whether the CDM data meets expectations ==== | ||
+ | |||
+ | The test SQL will create a table called ''%%test_results%%'' in ''cdmTestSchema'', and populate it with the results of the tests. (If the table already exists it will first be dropped). Again, we could use any SQL client to run this SQL, or we could use DatabaseConnector: | ||
+ | |||
+ | <code> | ||
+ | executeSql(connection, paste(testSql, collapse = "\n")) | ||
+ | </code> | ||
+ | |||
+ | Aftwerwards, we can query the results table to see the results for each test: | ||
+ | |||
+ | <code> | ||
+ | querySql(connection, "SELECT * FROM test_results") | ||
</code> | </code> | ||
+ | Which could return this table: | ||
- | For an example set of test definitions, see the [[https://github.com/OHDSI/JCdmBuilder/blob/master/tests/HCUPETLToV5/HcupTests.R|HCUP ETL unit tests]] | + | ^ID ^DESCRIPTION TEST ^STATUS | |
+ | |101 |Person gender mappings |PASS | | ||
+ | |101 |Person gender mappings |PASS | | ||
+ | In this case we see there were two expect statements under test 101 (Person gender mappings), and both expectations were met so the test passed. |