This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
projects:workgroups:patient-level_prediction:best-practice [2016/05/04 08:23] jreps [Best practices] |
projects:workgroups:patient-level_prediction:best-practice [2016/05/04 15:43] prijnbeek [Best practices] |
||
---|---|---|---|
Line 11: | Line 11: | ||
===== Best practices ===== | ===== Best practices ===== | ||
- | **Data characterisation and cleaning**: Before modelling it is important to characterize the cohorts, for example by looking at the prevalence of certain covariates. Tools are being developed in the community to facilitate this. A data cleaning step is recommend, e.g. remove outliers in lab values. | + | **Data characterisation and cleaning**: Before modelling it is important to characterize the cohorts, for example by looking at the prevalence of certain covariates. Tools are being developed in the community to facilitate this. A data cleaning step is recommended, e.g. remove outliers in lab values. |
**Dealing with missing values **: A best practice still needs to established. | **Dealing with missing values **: A best practice still needs to established. | ||
Line 17: | Line 17: | ||
**Feature construction and selection**: Both feature construction and selection should be completely transparent using a standardised approach to be able repeat the modelling but also to enable application of the model on unseen data. | **Feature construction and selection**: Both feature construction and selection should be completely transparent using a standardised approach to be able repeat the modelling but also to enable application of the model on unseen data. | ||
- | **Inclusion and exclusion criteria** should be made explicit. It is recommended to do sensitivity analyses not he choices made. Visualisation tools could help and this will be further explored in the WG. | + | **Inclusion and exclusion criteria** should be made explicit. It is recommended to do sensitivity analyses on the choices made. Visualisation tools could help and this will be further explored in the WG. |
**Model development** is done using a split-sample approach. The percentage used for training could depend on the number of cases, but as a rule of thumb 80/20 split is recommended. Hyper-parameter training should only be done on the training set. | **Model development** is done using a split-sample approach. The percentage used for training could depend on the number of cases, but as a rule of thumb 80/20 split is recommended. Hyper-parameter training should only be done on the training set. |