This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
documentation:software:whiterabbit:test_framework [2016/06/03 06:22] schuemie |
documentation:software:whiterabbit:test_framework [2020/02/18 14:13] (current) maximmoinat |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Rabbit-In-a-Hat testing framework ====== | ====== Rabbit-In-a-Hat testing framework ====== | ||
+ | |||
+ | **NOTICE FEB 2020:** the current documentation can be found on Github | ||
+ | http://ohdsi.github.io/WhiteRabbit/riah_test_framework.html | ||
+ | |||
Rabbit-In-a-Hat can generate a framework for creating a set of [[https://en.wikipedia.org/wiki/Unit_testing|unit tests]]. The framework consists of a set of R functions tailored to the source and target schema in your ETL. These functions can then be used to define the unit tests. | Rabbit-In-a-Hat can generate a framework for creating a set of [[https://en.wikipedia.org/wiki/Unit_testing|unit tests]]. The framework consists of a set of R functions tailored to the source and target schema in your ETL. These functions can then be used to define the unit tests. | ||
- | Unit testing assumes that you have your data in source format somewhere in a database. You should already have created an ETL process that will extract from the source database, transform it into CDM format, and load it into a CDM schema. The unit test framework can be used to make sure that your ETL process is doing what it is supposed to do. For this you will need to create a new, empty database with exactly the same structure as your source database, and a new empty database where a test CDM database will live. The framework can be used to insert test data into the empty source schema, and once you've run your ETL process on the test data it can be used to verify that the output of the ETL (in the test CDM schema) is what you'd expect given the test source data. | + | Unit testing assumes that you have your data in source format somewhere in a database. You should already have created an ETL process that will extract from the source database, transform it into CDM format, and load it into a CDM schema. The unit test framework can be used to make sure that your ETL process is doing what it is supposed to do. For this you will need to create a new, empty database with exactly the same structure as your source database, and a new empty database where a test CDM database will live. The framework can be used to insert test data into the empty source schema. You can then run your ETL process on the test data to populate the test CDM database. you can then use the framework to verify that the output of the ETL in the test CDM database is what you'd expect given the test source data. |
===== Overview ===== | ===== Overview ===== | ||
Line 73: | Line 77: | ||
==== Generate test data in the source data schema ==== | ==== Generate test data in the source data schema ==== | ||
- | If we run our full R script two variables are created in R. The first variable is ''%%insertSql%%'', and it contains the SQL for creating the test data. The SQL assumes that the data schema already exists, and will first remove any records that might be in the tables. We can execute the SQL in any SQL client, or we can use OHDSI's [[https://github.com/OHDSI/DatabaseConnector|DatabaseConnector package]]. For example: | + | After we have defined all our tests we need to run |
+ | <code> | ||
+ | insertSql <- generateInsertSql(databaseSchema = "nativeTestSchema") | ||
+ | testSql <- generateTestSql(databaseSchema = "cdmTestSchema") | ||
+ | </code> | ||
+ | to generate the SQL for inserting the test data in the database (insertSql), and for running the tests on the ETL-ed data (testSql). The insertion SQL assumes that the data schema already exists in ''nativeTestSchema'', and will first remove any records that might be in the tables. We can execute the SQL in any SQL client, or we can use OHDSI's [[https://github.com/OHDSI/DatabaseConnector|DatabaseConnector package]]. For example: | ||
<code> | <code> | ||
Line 83: | Line 92: | ||
connection <- connect(connectionDetails) | connection <- connect(connectionDetails) | ||
- | executeSql(connection, "USE source_schema") | ||
executeSql(connection, paste(insertSql, collapse = "\n")) | executeSql(connection, paste(insertSql, collapse = "\n")) | ||
</code> | </code> | ||
Line 89: | Line 97: | ||
==== Run your ETL on the test data ==== | ==== Run your ETL on the test data ==== | ||
- | Now that the test source data is populated. You can run the ETL process you would like to test. | + | Now that the test source data is populated. You can run the ETL process you would like to test. The ETL should transform the data in ''nativeTestSchema'' to CDM data in ''cdmTestSchema''. |
==== Test whether the CDM data meets expectations ==== | ==== Test whether the CDM data meets expectations ==== | ||
- | The second variable created by our R script is called ''%%testSql%%''. It contains SQL that will create a table called ''%%test_results%%'' in your CDM schema, and populate it with the results of the tests. (If the table already exists it will first be dropped). Again, we could use any SQL client to run this SQL, or we could use DatabaseConnector: | + | The test SQL will create a table called ''%%test_results%%'' in ''cdmTestSchema'', and populate it with the results of the tests. (If the table already exists it will first be dropped). Again, we could use any SQL client to run this SQL, or we could use DatabaseConnector: |
<code> | <code> | ||
- | executeSql(connection, "USE cdm_schema") | ||
executeSql(connection, paste(testSql, collapse = "\n")) | executeSql(connection, paste(testSql, collapse = "\n")) | ||
</code> | </code> |