(needs rewriting to accommodate schemas in the specs)
We propose changes to the tables included in the CDM schema in order to clarify their intent. Specifically we propose
The OHDSI data architecture defines the different categories and relative schema that are used within the broader OHDSI architecture including the source (native schema), standardized (CDM schema), derived (results schema) and administrative (ohdsi schema).
In April 2012 the CDM V4 specification introduced the cohort table as a location to store records that share a particular feature during a particular time span and defined cohorts as a group of entities exposed to a common circumstance. This table has since been included as part of the DDL statements to create a CDM database.
When the initial tool to create cohort definitions (CIRCE) was introduced it introduced a new 'cohort' table that was found in the results schema where it would store people identified when a cohort definition is executed against a CDM database.
Since that time many other tools have been created and new tables have appropriately been deployed in the separate results schema to store the data that they derive from the CDM schema. This represents the fundamental issue we are seeking to resolve whereby some derived results are being stored in the CDM schema specified table and others are defined and maintained in the results schema.
Our proposed conventions are that all tables in the CDM schema should contain data that was derived from the original data source (also referred to as “native schema”) All tables in the RESULTS schema should contain data that was derived from the CDM schema. The RESULTS schema table will include tables for achilles results, cohort generation, heracles results, estimation results, etc.
This is a subtle change but one that provides clear conventions for the intent of the different schemas. The development of the database migration package will also provide a new and useful tool for users to be able to create the necessary tables to use OHDSI tools in a more flexible way. This will remove the current limitation whereas the only way to create RESULTS schema tables is by installing and running the WebAPI. The WebAPI will instead leverage this migration package to validate and migrate the tables required for its operation.
Additionally we propose that a convention be adopted whereby no table name is reused across any of the schemas defined in the data architecture in order to prevent collision or confusion.