Understanding Circe-be Logic Through Capr for Generating Complex Cohort Definitions
1 Introduction
1.1 ATLAS
Typically, we define cohort definitions for OHDSI studies using ATLAS. ATLAS has several benefits, in particular having a nice user interface to visual the cohort definition we are trying to create. However, there are times when ATLAS can be a bit tedious particularly when we must create several cohort definitions with a similar structure (template). We can deal with this situations by copying and pasting, however this can lead to errors in cohort logic and can also be quite time consuming.
1.2 Capr
Given the challenges of templating in ATLAS, the R package Capr (pronounced like the edible flower bud, caper) was created as a programatic interface to defining cohort logic for OMOP data, serving as an alternative avenue to generating cohort definitions for OHDSI studies. The advantage scripting cohort definitions is that we can define a template of our definition and iterate across multiple possibilities. Capr emphasizes the DRY principle in coding, (“Do not Repeat Yourself”) which forces programmers to define something once instead of multiple times. This sounds great, however this comes with a slight change in mindset when defining cohort definitions. To properly use Capr users need to understand the underlying logic expressed in circe-be. Capr attempts to re-populate the same json structure as one would in ATLAS, essentially a backdoor to circe-be which we have a bit more control over.
1.3 circe-be
Underneath the hood of ATLAS, there lies the circe-be software, essentially a bridge between clinical concept to computational query. When users fill out a cohort definition in ATLAS they are populating a json file. Think of the json like an “Mad-lib”, you are entering pieces into a structure that would formulate a coherent message. circe-be takes these instructions and translates them into a sql query that we can run against the OMOP data. This is a powerful tool because it is standardizing queries across the OHDSI network. In order to create this standardized query, circe-be builds elements of a sql script based on underlying components. Some of these components we are familiar with (primary criteria, inclusion rules, etc.) while others are not as well-known (query, count, group). The purpose of this demo is to use Capr to help users understand the underlying constructs of circe-be. Understanding these constructs will help improve users ability to create complex cohort definitions in ATLAS and Capr and learn the ideas towards templating in Capr.
2 Tutorial
In this tutorial we use Capr to show the circe-be structures. In particular we will demonstrate five structures: 1) Concept Set Expression (CSE), 2) Query, 3) Attribute, 4) Count and 5) Group. We provide code blocks of how to create the circe-be structure in Capr. Each code block is also accompanied by dplyr code that expresses the idea how the circe-be structure is constructed individually. The idea is to show how the Capr object would be deployed once it goes through the conversion process to standardized sql.
For our example we walk through the eMERGE phenotype for defining a Type 2 Diabetes (T2D) case. This is a complex algorithm with five potential pathways to define a T2D case, as shown in figure 1. To construct this full pathway we need to define the sub-components in the circe-be logic. We use Capr as a means of demonstrating each component of the circe-be semantic model and interfacing with these sub-components used to build cohort definitions. We build this cohort definition using the test CMS Synpuf database which includes the latest OMOP vocabulary used to define the logic. At the end of the tutorial we provide the full Capr code to build this complex cohort.
You can also watch these youtube videos to learn more about circe-be structures through ATLAS
2.1 Concept Set Expression
The first circe-be structure is the concept set expression. This is essentially a code list used to define a clinical event of interest. The expression aspect of this structure adds relational structure to the code set, incorporating descendant logic and adding exceptions to the code list for refined definition. In the eMerge algorithm, some of the paths require T2D medications in order to find a case of T2D. In the documentation we can get the list of RxNorm codes and then find the OMOP concept IDs for them. But databases record events more than just the ingredient, there can be different dosages, brands and delivery methods. However, we want to count all of these variations. This can be done using a concept set expression. In the Capr code we look up the drug IDs and then check off the includeDescendants toggle to add in all concepts that descend this hierarchy. Now when we want to look up a record of a T2D medication, we do not just look up the ingredient concept, we look at all the descendants (quite powerful 💪).
The code below is how we construct a concept set expression using Capr. The first step is to look up the concept ids in the concept table of the OMOP vocabularies and then merge this with the concept_ancestor table to find all descendants. When we run this line of code, remember that you need to establish a connection to your database to access the vocabulary tables in the defined schema.
#define T2D medication ingredients
<- c(1502809L, 1502826L, 1503297L, 1510202L,
T2RxIds
1515249L, 1516766L, 1525215L, 1529331L,
1530014L, 1547504L, 1559684L, 1560171L,
1580747L, 1583722L, 1594973L, 1597756L)
#create CSE in Capr
<- getConceptIdDetails(
T2Rx conceptIds = T2RxIds,
connectionDetails = execution_settings$connectionDetails,
vocabularyDatabaseSchema = execution_settings$vocabulary_schema) %>%
createConceptSetExpression(
Name = "Type 2 Diabetes Medications",
includeDescendants = TRUE)
To give further context as to what is going on here, we use dplyr to abstract the sql query that is taking place behind the scenes. Again we take our list of ingredient concepts and find all the descendants through the concept_ancestor table. The ohdsisql typically holds this in a temp table.
In Capr our first step in defining a cohort is to define the CSE. This construct holds all of the codes we want to look up across the different tables in CDM. To build a cohort definition we need to make sure our list of concepts is thorough.
# example query for CSE
<- cdm$concept %>%
allT2dRx ::inner_join(
dplyr$concept_ancestor,
cdmby = c("concept_id" = "descendant_concept_id")) %>%
::filter(ancestor_concept_id %in% T2RxIds,
dplyris.null(invalid_reason)) %>%
select(concept_id:invalid_reason) %>%
::collect() dplyr
2.2 Query
Recall the structure of the CDM (shown in the margin). We have a relational database, so to extract data from this format we need to merge tables using keys. For example, say we are looking for metformin users. We would merge the concept_id for metformin from the concept table to the drug_concept_id in the drug_exposure table and find person_id for patients that took metformin. Simply put we are performing a type of query on the relational database.
The Capr code for a query is very simple. We need to define which domain we need to look up “hits” of the concept set. Queries in Capr are defined by the create verb followed by the name of the clinical table. For example if we want a condition occurrence the Capr signature is createConditionOccurrence
. The input of the query must be a Concept Set Expression object. Using Capr we are simply telling the circe-be engine that we want to look up a particular concept set in the designated domain.
# create query in Capr
<- createDrugExposure(conceptSetExpression = T2Rx) T2RxQuery
Further we can show how this declaration would be deployed in circe-be through the code below. We join the CSE we made earlier with the drug exposure table looking for persons that have a record of the code in their patient history.
# example query for a query ¯\_(ツ)_/¯
<- cdm$drug_exposure %>%
query inner_join(allT2dRx, by = c("drug_concept_id" = "concept_id")) %>%
select(drug_exposure_id, person_id,
drug_concept_id, drug_exposure_start_date, %>%
concept_name, vocabulary_id) collect()
2.3 Attribute
Closely associated with a query is an attribute. An attribute modifies the query to subset the persons from the query that contain a particular value based on another column in the clinical table. All attributes are based on columns in the clinical domain table or from the person table. For example, in the T2D example we want measurement values where the random glucose have a value greater than 200 mg/dL (which would designate an abnormal measure). In this case we would look up all persons with a Random Glucose concept and then search the value as number column to see if the listed value is greater than 200. When constructing cohort definitions, remember that the attribute complements the query.
Using Capr an attribute object is first defined outside the query and then placed in a list within the query command. In the code block below we create an object called value200 which holds an attribute to modify the query. This attribute is called an OpAttribute
. We are deploying an mathematical operator or inequality to describe the logic of interest in our query. Other attributes include a ConceptAttribute
and LogicAttribute
.
#create Random glucose CSE in Capr
<- getConceptCodeDetails(
AbLabRandomGluc conceptCode = c("2339-0", "2345-7"),
vocabulary = "LOINC",
connectionDetails = execution_settings$connectionDetails,
vocabularyDatabaseSchema = execution_settings$vocabulary_schema,
mapToStandard = TRUE) %>%
createConceptSetExpression(Name = "Abnormal Lab Random Glucose",
includeDescendants = TRUE)
# create an attribute of >= 200 mg/dl
<- createValueAsNumberAttribute(Op = "gt", Value = 200L)
value200
#Create Random glucose Query with value attribute
<- createMeasurement(
AbLabRandomGlucQuery conceptSetExpression = AbLabRandomGluc,
attributeList = list(value200)
)
The example shown for Capr is a tad tricky to show in synpuf because there are limited lab values so we show an example using a gender attribute. Again the attribute is modifier of the query, where we are filter the matching persons by the existence of another value. In the case of gender we have a concept ID for female (8532) so to find females who have taken a T2D medications, we first do a filter join to find the persons with a “hit” from the CSE and then we join on the person table by the person_id. From this set of persons we filter to only count those with a concept id of 8532 in the gender_concept_id column of the person table. As you can see the attribute is additional filtering logic that modifies the query.
# example query for an attribute
<- cdm$drug_exposure %>%
attribute inner_join(allT2dRx, by = c("drug_concept_id" = "concept_id")) %>%
inner_join(cdm$person, by = c("person_id")) %>%
filter(gender_concept_id == 8532L) %>%
select(drug_exposure_id, person_id, gender_concept_id,
drug_concept_id, drug_exposure_start_date, %>%
concept_name) collect()
2.4 Count
So far we have not incorporated time into our queries only the existence of a code in a table. However, timing is vital when determining a cohort of patients. We need to ensure that of the initial set of patients, we restrict people who have experienced a medical event at some plausible point in their patient history. For example if we want persons with T2D, we want to ensure they do not have prior type 1 diabetes. This is the essence of the circe-be count structure; we enumerate patients based on the temporal occurrence of a medical event. Counts are typically only defined in Additional Criteria and Inclusion Rules because we need the occurrence of a prior event in order to define a window in the patient history on which to enumerate.
In Capr counts require: 1) a query, 2) a count and 3) a timeline. The timeline sets the window of observation relative to another event, in circe-be this is typically relative to the primary criteria (unless we are building a correlated criteria attribute). In the exmaple below we define two counts: 1) at least 1 occurrence of an T2D medication and 2) no occurrence of a T2D medications. Note these are for different pathways in the T2D eMerge algorithm. Relative to the primary criteria we define our window as all time before and no time after. Now we can begin to enumerate 🧮! We want to observe the occurrence of a query x instances where x is some value that we apply with an inequality. If we want at least 1 instance we follow the first example in the Capr code and if we want no occurrence we follow the second example in the Capr code.
So if we want to create an inclusion rule we need to understand the primary criteria before we build any rule. Next we want to define the time relative to this index event where to create a window. Then we create a query of a medical event we want to observe in this window. Finally we want to define how many times we observe this event in the patient history in order for the subject to be included or excluded.
#create timeline
<- createTimeline(
tl1 StartWindow = createWindow(
StartDays = "All",
StartCoeff = "Before",
EndDays = 0L,
EndCoeff = "After")
)
#at least 1 T2DM medication
<- createCount(
atLeast1T2RxCount Query = T2RxQuery,
Logic = "at_least",
Count = 1L,
Timeline = tl1)
#no exposure to T2DM medication
<- createCount(
noT2RxCount Query = T2RxQuery,
Logic = "exactly",
Count = 0L,
Timeline = tl1)
An example of how circe-be deploys a count construct can be seen in the naive example below. We want to include people into the cohort if they have experienced an exposure to T2D medication between 365 to 1 day prior to a T2D diagnosis. As we can see the idea of a count is to enumerate an event temporally based on some prior event. Counts are usually used within additional criteria and inclusion rules where the temporal bounds are set by the primary criteria.
<- cdm$condition_occurrence %>%
count1 ::filter(condition_concept_id == 201826L) %>%
dplyr::group_by(person_id) %>%
dplyr::mutate(rn = min_rank(condition_start_date)) %>%
dplyr::ungroup() %>%
dplyr::filter(rn == 1) %>%
dplyr::select(condition_occurrence_id:condition_start_date) %>%
dplyr::inner_join(
dplyr$drug_exposure %>%
cdm::select(drug_exposure_id:drug_exposure_start_date) %>%
dplyr::inner_join(allT2dRx,
dplyrby = c("drug_concept_id" = "concept_id")),
by = c("person_id")
%>%
) ::mutate(
dplyrhit = dplyr::if_else(dplyr::between(
drug_exposure_start_date,- lubridate::days(365),
condition_start_date - lubridate::days(1)),
condition_start_date
1L, 0L, 0L)%>%
) ::filter(hit == 1) %>%
dplyr::group_by(person_id) %>%
dplyr::mutate(rn = min_rank(drug_exposure_start_date)) %>%
dplyr::ungroup() %>%
dplyr::filter(rn == 1) %>%
dplyr::select(person_id:condition_start_date,
dplyr
drug_exposure_start_date,%>%
drug_concept_id, concept_name) ::distinct() %>%
dplyr::collect() dplyr
2.5 Group
Groups are the most complex, yet most powerful structure in the underlying circe-be semantic model. A group bundles all counts and groups together into a single piece of logic that determines whether a person is added or omitted from a cohort. The eMerge T2D algorithm offers excellent examples of a group. The first path towards a T2D case is no occurrence of T2D diagnosis, at least 1 T2D medication and at least 1 abnormal lab measurement. The patient needs to pass all three of these rules in order to be added or omitted from the cohort. Interestingly in this example we are using two counts and a group. Group objects in circe-be can hold other groups 🤯! The T2D algorithm defines abnormal labs as one of any: 1) random glucose \(> 200mg/dL\), 2) HbA1c of \(\geq 6.5\%\) and 3) fasting glucose \(\geq 125 mg/dL\). After defining this group we then need to bundle the count substructures for no occurrence of T2D diagnosis and at least 1 T2D medication. The Capr code below shows how to build this structure from start to finish.
#AbLab Counts
#at least 1 abnormal HbA1c Lab
<- createCount(Query = AbLabHbA1cQuery,
atLeast1AbLabHbA1cCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#at least 1 abnormal Fasting Glucose Lab
<- createCount(Query = AbLabFastingGlucQuery,
atLeast1AbLabFastingGlucCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#at least 1 abnormal Random Glucose Lab
<- createCount(Query = AbLabRandomGlucQuery,
atLeast1AbLabRandomGlucCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#ab lab group
<- createGroup(
atLeast1AbLabGroup Name = "Abnormal labs for HbA1c, Fasting+Random Glucose",
type = "ANY",
criteriaList = list(
atLeast1AbLabHbA1cCount,
atLeast1AbLabFastingGlucCount,
atLeast1AbLabRandomGlucCount)
)
#no occurrence of T2 Diabetes
<- createCount(Query = T2DxQuery,
noT2DxCount Logic = "exactly",
Count = 0L,
Timeline = tl1)
#at least 1 T2DM medication
<- createCount(Query = T2RxQuery,
atLeast1T2RxCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
# Path 1: 0 T2Dx, 1+ T2Rx and 1+ AbLab
<- createGroup(
Pathway1T2DMGroup Name = "Pathway1",
Description = "0 T2Dx, 1+ T2Rx and 1+ AbLab",
type = "ALL",
criteriaList = list(noT2DxCount, atLeast1T2RxCount),
Groups = list(atLeast1AbLabGroup))
Again the group example is hard to depict in synpuf data so we simplify it to provide a {dplyr}
representation. We could have two count objects persons who take T2D medications and those who take ace inhibitors. Of people who are diagnosed with T2D we want to see if they have taken both of these medications to be in the cohort. A group allows us to combine the logic of both of these counts using a join statement as shown below.
# ace inhibitors drugs
<- c(1308216L, 1310756L,
aceIds
1331235L, 1334456L,
1335471L, 1340128L,
1341927L, 1342439L,
1363749L, 1373225L)
<- cdm$concept %>%
aceInhib ::inner_join(
dplyr$concept_ancestor,
cdmby = c("concept_id" = "descendant_concept_id")) %>%
::filter(ancestor_concept_id %in% aceIds,
dplyris.null(invalid_reason)) %>%
select(concept_id:invalid_reason)
# Second Count: an exposure to T2D Rx before Afib Dx
<- cdm$condition_occurrence %>%
count2 ::filter(condition_concept_id == 201826L) %>%
dplyr::group_by(person_id) %>%
dplyr::mutate(rn = min_rank(condition_start_date)) %>%
dplyr::ungroup() %>%
dplyr::filter(rn == 1) %>%
dplyr::select(condition_occurrence_id:condition_start_date) %>%
dplyr::inner_join(
dplyr$drug_exposure %>%
cdm::select(drug_exposure_id:drug_exposure_start_date) %>%
dplyr::inner_join(aceInhib, by = c("drug_concept_id" = "concept_id")),
dplyrby = c("person_id")
%>%
) ::mutate(
dplyrhit = dplyr::if_else(dplyr::between(
drug_exposure_start_date,- lubridate::days(365),
condition_start_date - lubridate::days(1)),
condition_start_date
1L, 0L, 0L)%>%
) ::filter(hit == 1) %>%
dplyr::group_by(person_id) %>%
dplyr::mutate(rn = min_rank(drug_exposure_start_date)) %>%
dplyr::ungroup() %>%
dplyr::filter(rn == 1) %>%
dplyr::select(person_id:condition_start_date,
dplyr
drug_exposure_start_date,%>%
drug_concept_id, concept_name) ::distinct()
dplyr
# formulation of a group
<- count1 %>%
group inner_join(count2, by = "person_id") %>%
::collect() dplyr
3 eMerge T2D
In the tutorial above we defines the 5 essential circe-be substructures that are needed to build elements of a cohort definition. Capr defines cohort definitions from the bottom up so we need to understand these sub-structures to effectively create complex cohort definitions. Our understanding of these sub-structures improves the way we can build cohorts and create templates in Capr. The following is how one would create the full T2D algorithm from eMerge. While this is a long code block, this shows how the fundamental pieces can be created and deployed into different iterations to formulate complex algorithms.
library(Capr)
library(DatabaseConnector)
library(CohortGenerator)
#lookup concepts for T2DM cohort -------------------
#Type 2 Diabetes Diagnosis
<- getConceptIdDetails(
T2Dx conceptIds = 201826,
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema) %>%
createConceptSetExpression(
Name = "Type 2 Diabetes Diagnosis",
includeDescendants = TRUE)
#Type 2 Diabetes Medications
<- c(1502809L, 1502826L, 1503297L, 1510202L,
T2RxIds
1515249L, 1516766L, 1525215L, 1529331L,
1530014L, 1547504L, 1559684L, 1560171L,
1580747L, 1583722L, 1594973L, 1597756L)
<- getConceptIdDetails(
T2Rx conceptIds = T2RxIds,
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema) %>%
createConceptSetExpression(
Name = "Type 2 Diabetes Medications",
includeDescendants = TRUE)
#Type 1 Diabetes Diagnosis
<- getConceptIdDetails(
T1Dx conceptIds = 201254,
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema) %>%
createConceptSetExpression(
Name = "Type 1 Diabetes Diagnosis",
includeDescendants = TRUE)
#Type 1 Diabetes Medications
<- paste(c(139825,274783,314684,
T1DRxNormCodes 352385,400008,51428,
5856,86009,139953))
<- getConceptCodeDetails(
T1Rx conceptCode = T1DRxNormCodes,
vocabulary = "RxNorm",
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema,
mapToStandard = TRUE) %>%
createConceptSetExpression(
Name = "Type 1 Diabetes Medications",
includeDescendants = TRUE)
#Abnormal Lab
<- c("4548-4", "17856-6", "4549-2", "17855-8") %>%
AbLabHbA1c getConceptCodeDetails(conceptCode = .,
vocabulary = "LOINC",
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema,
mapToStandard = TRUE) %>%
createConceptSetExpression(Name = "Abnormal Lab HbA1c",
includeDescendants = TRUE)
#Ab Lab for Random Glucose (>= 200 mg/dl)
<- c("2339-0", "2345-7") %>%
AbLabRandomGluc getConceptCodeDetails(conceptCode = .,
vocabulary = "LOINC",
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema,
mapToStandard = TRUE) %>%
createConceptSetExpression(Name = "Abnormal Lab Random Glucose",
includeDescendants = TRUE)
#Ab Lab for Fasting Glucose (>= 125 mg/dl)
<- c("1558-6") %>%
AbLabFastingGluc getConceptCodeDetails(conceptCode = .,
vocabulary = "LOINC",
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema,
mapToStandard = TRUE) %>%
createConceptSetExpression(Name = "Abnormal Lab Fasting Glucose",
includeDescendants = TRUE)
## Set up Queries -----------------------
#########################
#T2Rx Drug Exposure Query
#########################
<- createDrugExposure(conceptSetExpression = T2Rx)
T2RxQuery
#########################
#T1Rx Drug Exposure Query
#########################
<- createDrugExposure(conceptSetExpression = T1Rx)
T1RxQuery
################################
#T2Dx Condition Occurrence Query
################################
<- createConditionOccurrence(conceptSetExpression = T2Dx)
T2DxQuery
################################
#T1Dx Condition Occurrence Query
#################################
<- createConditionOccurrence(conceptSetExpression = T1Dx)
T1DxQuery
########################
#Abnormal Lab Query
############################
#HbA1c Query with value attribute
<- createMeasurement(conceptSetExpression = AbLabHbA1c,
AbLabHbA1cQuery attributeList = list(
#add attribute of >= 6 %
createValueAsNumberAttribute(
Op = "gte",
Value = 6.5)
))#RandomGluc Query with value attribute
<- createMeasurement(conceptSetExpression = AbLabRandomGluc,
AbLabRandomGlucQuery attributeList = list(
#add attribute of >= 200 mg/dl
createValueAsNumberAttribute(
Op = "gt",
Value = 200L)
))#FastingGluc Query with value attribute
<- createMeasurement(conceptSetExpression = AbLabFastingGluc,
AbLabFastingGlucQuery attributeList = list(
#add attribute of >= 125 mg/dl
createValueAsNumberAttribute(
Op = "gte",
Value = 125L)
))
## Create Counts -----------------
#create timeline
<- createTimeline(StartWindow = createWindow(
tl1 StartDays = "All", StartCoeff = "Before",
EndDays = 0L, EndCoeff = "After"))
#################
#Diagnosis Counts
#################
#no occurrence of T1 Diabetes
<- createCount(Query = T1DxQuery,
noT1DxCount Logic = "exactly",
Count = 0L,
Timeline = tl1)
#no occurrence of T2 Diabetes
<- createCount(Query = T2DxQuery,
noT2DxCount Logic = "exactly",
Count = 0L,
Timeline = tl1)
#at least 1 occurrence of T2 Diabetes
<- createCount(Query = T2DxQuery,
atLeast1T2DxCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#at least 2 occurrence of T2 Diabetes
<- createCount(Query = T2DxQuery,
atLeast2T2DxCount Logic = "at_least",
Count = 2L,
Timeline = tl1)
##################
#Medication Counts
##################
#at least 1 T2DM medication
<- createCount(Query = T2RxQuery,
atLeast1T2RxCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#no exposure to T2DM medication
<- createCount(Query = T2RxQuery,
noT2RxCount Logic = "exactly",
Count = 0L,
Timeline = tl1)
#at least 1 T1DM medication
<- createCount(Query = T1RxQuery,
atLeast1T1RxCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#no exposure to T1DM medication
<- createCount(Query = T1RxQuery,
noT1RxCount Logic = "exactly",
Count = 0L,
Timeline = tl1)
#################
#AbLab Counts
#################
#at least 1 abnormal HbA1c Lab
<- createCount(Query = AbLabHbA1cQuery,
atLeast1AbLabHbA1cCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#at least 1 abnormal Fasting Glucose Lab
<- createCount(Query = AbLabFastingGlucQuery,
atLeast1AbLabFastingGlucCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
#at least 1 abnormal Random Glucose Lab
<- createCount(Query = AbLabRandomGlucQuery,
atLeast1AbLabRandomGlucCount Logic = "at_least",
Count = 1L,
Timeline = tl1)
## Create Groups ----------------------------
#1) No T1Dx at any point in patient history
<- createGroup(Name = "No Diagnosis of Type 1 Diabetes",
NoT1DxGroup type = "ALL",
criteriaList = list(noT1DxCount))
#2) AbLab Group (>=6% HbA1c, >= 125mg/dl Fasting Glucose,
#>= 200 mg/dl Random Glucose)
<- createGroup(
atLeast1AbLabGroup Name = "Abnormal labs for HbA1c, Fasting+Random Glucose",
type = "ANY",
criteriaList = list(
atLeast1AbLabHbA1cCount,
atLeast1AbLabFastingGlucCount,
atLeast1AbLabRandomGlucCount)
)
#3) Nested Criteria T2Rx precedes T1Rx
<- createTimeline(StartWindow = createWindow(
tl2 StartDays = "All", StartCoeff = "Before",
EndDays = 1L, EndCoeff = "Before"))
<- createCount(
PriorT2RxCount Query = T2RxQuery,
Logic = "at_least",
Count = 1L,
Timeline = tl2
)
<- createCorrelatedCriteriaAttribute(
PriorT2RxNestedGroup createGroup(
Name = "Nested Group T2Rx before T1Rx",
type = "ALL",
criteriaList = list(PriorT2RxCount)
)
)
<- createDrugExposure(
T2RxBeforeT1RxCount conceptSetExpression = T1Rx,
attributeList = list(PriorT2RxNestedGroup)) %>%
createCount(Logic = "at_least", Count = 1L,
Timeline = tl1)
#4) Path 1: 0 T2Dx, 1+ T2Rx and 1+ AbLab
<- createGroup(
Pathway1T2DMGroup Name = "Pathway1",
Description = "0 T2Dx, 1+ T2Rx and 1+ AbLab",
type = "ALL",
criteriaList = list(noT2DxCount, atLeast1T2RxCount),
Groups = list(atLeast1AbLabGroup))
#5) Path 2: 1+ T2Dx, 0 T1Rx, 0 T2Rx, and 1+ AbLab
<- createGroup(
Pathway2T2DMGroup Name = "Pathway2",
Description = "1+ T2Dx, 0 T1Rx, 0 T2Rx, and 1+ AbLab",
type = "ALL",
criteriaList = list(atLeast1T2DxCount, noT1RxCount, noT2RxCount),
Groups = list(atLeast1AbLabGroup))
#6) Path 3: 1+ T2Dx, 0 T1Rx, and 1+ T2Rx
<- createGroup(
Pathway3T2DMGroup Name = "Pathway3",
Description = "1+ T2Dx, 0 T1Rx, and 1+ T2Rx",
type = "ALL",
criteriaList = list(atLeast1T2DxCount, noT1RxCount, atLeast1T2RxCount)
)#7) Path 4: 1+ T2Dx, 1+ T1Rx, 1+T2Rx, and 1+ T2Rx < T1Rx
<- createGroup(
Pathway4T2DMGroup Name = "Pathway4",
Description = "1+ T2Dx, 1+ T1Rx, 1+T2Rx, and 1+ T2Rx < T1Rx",
type = "ALL",
criteriaList = list(atLeast1T2DxCount, atLeast1T1RxCount,
T2RxBeforeT1RxCount)
)#8) Path 5: 1+ T2Dx, 1+ T1Rx, 0 T2Rx and 2+ T2Dx
<- createGroup(
Pathway5T2DMGroup Name = "Pathway5",
Description = "1+ T2Dx, 1+ T1Rx, 0 T2Rx and 2+ T2Dx",
type = "ALL",
criteriaList = list(atLeast1T2DxCount, atLeast1T1RxCount,
noT2RxCount, atLeast2T2DxCount)
)
#9) T2DM Case Group
<- createGroup(
T2DMCase Name = "Case for T2DM using algorithm",
type = "ANY",
Groups = list(Pathway1T2DMGroup, Pathway2T2DMGroup,
Pathway3T2DMGroup, Pathway4T2DMGroup,
Pathway5T2DMGroup)
)
## Create Cohort Definition ----------------------------
#create Primary criteria that initial captures persons
#they have a T2DM diagnosis, a T2Rx, and an abnormal lab
<- createPrimaryCriteria(
PrimaryCriteria Name = "PC for T2DM Case Phenotype",
ComponentList = list(T2DxQuery,T2RxQuery,AbLabHbA1cQuery,
AbLabFastingGlucQuery,AbLabRandomGlucQuery),ObservationWindow = createObservationWindow(),
Limit = "All")
#create additional Criteria
#further restrict the initial capture to people with no T1Dx
<- createAdditionalCriteria(
AdditionalCriteria Name = "AC for T2DM Case Phenotype",
Contents = NoT1DxGroup,
Limit = "First"
)
#create Inclusion Rules
#keep T2DM cases if they meet 1 of the 5 pathways
<- createGroup(
T2DMCase Name = "Case for T2DM using algorithm",
type = "ANY",
Groups = list(Pathway1T2DMGroup, Pathway2T2DMGroup,
Pathway3T2DMGroup, Pathway4T2DMGroup,
Pathway5T2DMGroup)
)
<- createInclusionRules(
InclusionRules Name = "IRs for T2DM Case Phenotype",
Contents = list(T2DMCase),
Limit = "First"
)
::saveComponent(InclusionRules,
CaprsaveName = "phekbT2dCase",
savePath = "cohorts/components")
#create Censoring Criteria
#person exits cohort if there is a diagnosis of T1DM
<- createCensoringCriteria(
CensoringCriteria Name = "Censor of T1DM cases",
ComponentList = list(T1DxQuery)
)
#Create Cohort Definition
<- createCohortDefinition(
T2DMPhenotype Name = "PheKB T2DM Definition",
PrimaryCriteria = PrimaryCriteria,
AdditionalCriteria = AdditionalCriteria,
InclusionRules = InclusionRules,
CensoringCriteria = CensoringCriteria
)
### compile circe
<- compileCohortDefinition(T2DMPhenotype)
T2DMPhenotypeJson
#save inclusion rules
::saveComponent(InclusionRules,
CaprsaveName = "phekbT2dCase",
savePath = "cohorts/components")
::saveComponent(AdditionalCriteria, saveName = "noT1D_AC", savePath = "cohorts/components")
Capr
## Additional manipulations ---------------
#import json
<- Capr::readInCirce(jsonPath = "cohorts/json/sglt2.json",
sglt2Cohort connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema)
#lookup drug
<- Capr::getConceptIdDetails(
glp1 conceptIds = c(793143, 40170911, 43013171, 44816332, 45774435, 1583722),
connectionDetails = connectionDetails,
vocabularyDatabaseSchema = vocabularyDatabaseSchema)
# Turn into CSE
<- Capr::createConceptSetExpression(
glp1CSE conceptSet = glp1,
Name = "GLP1",
includeDescendants = TRUE
)#Create Drug Exposure Query
<- Capr::createDrugExposure(
glp1Query conceptSetExpression = glp1CSE,
attributeList = list(
::createAgeAttribute(Op = "gte", Value = 18),
Capr::createFirstAttribute(),
Capr::createOccurrenceStartDateAttribute(Op = "gt",
CaprValue = "2012-01-01")
))#Create Primary Criteria
<- Capr::createObservationWindow(PriorDays = 365, PostDays = 0)
ow <- Capr::createPrimaryCriteria(Name = "GLP1 Exposure",
pc ComponentList = list(glp1Query),
ObservationWindow = ow,
Limit = "All")
<- sglt2Cohort
glp1Cohort @PrimaryCriteria <- pc glp1Cohort