Dissecting a cohort study
David Madigan1
Patrick Ryan2
Martijn J. Schuemie2
Marc A. Suchard3
- Department of Statistics, Columbia University
- Janssen Research & Development, LLC
- Departments of Biomathematics and Human Genetics, David Geffen School of Medicine, and Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles
Exercise
Paper: Graham et al. (2015) Circulation
“Cardiovascular, bleeding and mortality risks in elderly Medicare patients treated with dabigatran or warfarin for nonvalvular atrial fibrillation”
- Team up into groups of 4 (2 programmer/designer pairs)
- Discuss:
- How to reproduce study
- Shortcomings of design or analysis that limit reliability
- Share with class
All in 60 minutes
Getting started
Stated design: “A new-user retrospective cohort design …”
What are the target (T) and comparator (C) cohorts?
How are these populations made more exchangeable?
How is the outcome (O) cohort defined?
What is the time-at-risk for the outcome?
How are outcome rate differences estimated and assessed?
Important: Look for ambiguity or lack of transparency
We want reproducible and reliable research!
Example
What are the target (T) and comparator (C) cohorts?
- How are the medical concepts in the inclusion/exclusion criteria defined?
Now, get to work!
T and C cohorts
- Elderly (>= 65) Medicare beneficiaries (A, B and D) with nonvalvular atrial fibrillation who initiated therapy with dabigatran (T) or warfarin (C)
Is this correct?
Inclusion criteria
All patients who:
- Have any inpatient or outpatient AF or atrial flutter ICD9 codes
- Filled at least 1 prescription for either drug between Oct 19, 2010 - Dec 31, 2012
Index date: first prescription date
- How are “inpatient or outpatient AF or atrial flutter ICD9 codes” defined?
Supplementary material provides codes for outcomes only.
!Inclusion (exclusion) criteria
All patients who:
- Have > 6 months of Medicare enrollment before index date
- Were < 65
- Received prior treatment (when?) with NOAC or warfarin
- Were in a skilled nursing facility on index date (why?)
- Were in hospice on index date (why?)
- Had a hospitalization “that extended beyond the index dispensing date”
- Undergoing dialysis (when?)
- Were kidney transplant recipients
- Had diagnoses of valvular disease, DVT, PE, joint replacement during baseline 6 months
!Inclusion (exclusion) criteria
All patients who:
- Have < 6 months of Medicare enrollment before index date
- Were < 65
- Received prior treatment (when?) with NOAC or warfarin
- Were in a skilled nursing facility on index date (why?)
- Were in hospice on index date (why?)
- Had a hospitalization “that extended beyond the index dispensing date”
- Undergoing dialysis (when?)
- Were kidney transplant recipients
- Had diagnoses of valvular disease, DVT, PE, joint replacement during baseline 6 months
!Inclusion (exclusion) criteria
All patients who: (subset)
- Received prior treatment (when?) with NOAC or warfarin
- Had diagnoses of valvular disease, DVT, PE, joint replacement during baseline 6 months
Questions:
- Prior treatment during baseline 6 months or anytime before index data?
- How are the medical concepts defined?
- What about outcomes prior to exposure?
Balancing cohorts
Using propensity score model
- Logistic regression with “initiated dabigatran” as outcome and predictors:
Sociodemographics (?)
Baseline comorbidities, medications (?)
Prescriber characteristics
“Other potentially relevant variables”
How were these choosen? Other diagnostics?
- 1:1 ratio, greedy matching
- Balance assessment via:
- Standardized mean difference (target: <= 0.1)
What was not balanced before/after matching?
Balancing cohorts
Using propensity score model
- Logistic regression with “initiated dabigatran” as outcome and predictors:
Sociodemographics (?)
Baseline comorbidities, medications (?)
Prescriber characteristics
“Other potentially relevant variables”
How were these choosen? Other diagnostics?
- 1:1 ratio, greedy matching
- Balance assessment via:
- Standardized mean difference (target: <= 0.1)
What was not balanced before/after matching?
Outcome definitions
Risk of:
- Stroke
- Major gastrointestinal and intracranial bleeding
- Acute myocardial infraction
- Mortality
- How were the outcomes defined?
- Did the researchers get all of the appropriate codes?
- Were each outcome analyzed independently or were the T+C cohorts constituted only once?
- How does the proceeding influence the analysis?
Changing outcomes
Overlapping authors have an alternative protocol under Mini-Sentienel.
Protocol
How and why do the outcome definitions differ?
Outcome and time-at-risk
Time-to-event (first) event analysis using a Cox proportional hazards model
- Follow-up starts on index date + 1 and censored at:
Medicare disenrollment
> 3 day gap in anticoagulant supply
RX fill for a different anticoagulant
Start of hospice
Initation of dialysis or kidney transplant Admission to nursing facility
End of study
invisible
invisible
How does exposure duration influence the effect estimate?
Reliability
How do the authors assess the reliability of their estimates?
- Should use negative controls to measure residual bias (systematic error) and …
Sensitivity to time-at-risk
Restricted to patient with:
- Initial RX <= 30 days
- >= 2 RX fills
- Increased gap allowance to 14 days
How does exposure duration now influence the effect estimate?
Subgroup analysis: age, sex
Subgroup analyses are generally a bad idea. Instead, use interaction terms in the outcome regression model.
- Reduces multiple testing trouble
- Helps adjust for correlation between predictors
Correlation is a problem:
“Lower-dose recipients were more likely to be older, to be receiving home healthcare or home oxygen, and to have higher CHADS_2 and HAS-BLED scores”
Subgroup trouble
“Increased risk of major GI bleeding with dabigatran appeared to be restricted to women aged 75+ and to men aged 85+”
- CIs are not independent
- No adjustment for multiple testing
- Dashed lines assume only 6 tests
invisible
“This shift in hazard ratio between younger and older women presented a statistically significant interaction”
Subgroup trouble
“The magnitude of effect for each outcome was greater in the subgroup treated with dabigatran 150 mg twice daily compared with the main analysis”
- Really?
- Where is the evidence?
Kaplan-Meyer plots
Where are the confidence intervals?
Interpretation trouble
“The absolute incidence of outcome events for both dabigatran and warfarin was greatest during the first 90 days of treatment, although the hazard ratios for these outcomes were constant over time.”
Do we believe the effect is greater early in treatment?
- Large population sizes lead more events
- even when hazard is constant
Approximately 50% of population filled only one RX