This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
documentation:vocabulary_etl [2015/06/10 13:02] hardhouse |
documentation:vocabulary_etl [2015/12/10 12:57] hardhouse |
||
---|---|---|---|
Line 66: | Line 66: | ||
| SNOMED | UMLS Website | http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html | http://download.nlm.nih.gov/umls/kss/IHTSDO20150131/SnomedCT_RF2Release_INT_20150131.zip | | | SNOMED | UMLS Website | http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html | http://download.nlm.nih.gov/umls/kss/IHTSDO20150131/SnomedCT_RF2Release_INT_20150131.zip | | ||
| SNOMED | UK HSCIC Website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/26/subpack/102/releases | https://isd.hscic.gov.uk/artefact/trud3/1kkxe04jo40gw5q3xs0obolnvx/SNOMEDCT2/18.0.0/UK_SCT2CLFULL/uk_sct2clfull_18.0.0_20141001000001.zip | | | SNOMED | UK HSCIC Website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/26/subpack/102/releases | https://isd.hscic.gov.uk/artefact/trud3/1kkxe04jo40gw5q3xs0obolnvx/SNOMEDCT2/18.0.0/UK_SCT2CLFULL/uk_sct2clfull_18.0.0_20141001000001.zip | | ||
- | | RxNorm, NDFRT, VA Product, VA Class, ATC| UMLS website | http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html | http://download.nlm.nih.gov/umls/kss/rxnorm/RxNorm_full_03022015.zip | | + | | SNOMED | UK HSCIC Website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/6/subpack/24/releases | https://isd.hscic.gov.uk/artefact/trud3/15sgy3n9hin6pvu7rwv11go46t/NHSBSA/6.1.0/NHSBSA_DMD/nhsbsa_dmd_6.1.0_20150608000001.zip | |
+ | | SNOMED | UK HSCIC Website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/6/subpack/25/releases | https://isd.hscic.gov.uk/artefact/trud3/1mebfzrdolgi1kwge9xh0ddgtl/NHSBSA/6.3.0/NHSBSA_DMDBONUS/nhsbsa_dmdbonus_6.3.0_20150622000001.zip | | ||
+ | | RxNorm, NDFRT, VA Product, VA Class, ATC, MeSH, ICD10, GCN_SEQNO, ETC, Indication | UMLS website | http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html | http://download.nlm.nih.gov/umls/kss/rxnorm/RxNorm_full_03022015.zip | | ||
| HCPCS| CMS website | http://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Alpha-Numeric-HCPCS.html | http://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Downloads/2015-Annual-Alpha-Numeric-HCPCS-File.zip | | | HCPCS| CMS website | http://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Alpha-Numeric-HCPCS.html | http://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Downloads/2015-Annual-Alpha-Numeric-HCPCS-File.zip | | ||
| ICD9CM and ICD9Proc | CMS website | http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes.html| http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/Downloads/ICD-9-CM-v32-master-descriptions.zip | | | ICD9CM and ICD9Proc | CMS website | http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes.html| http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/Downloads/ICD-9-CM-v32-master-descriptions.zip | | ||
Line 75: | Line 77: | ||
| LOINC| UMLS website | http://www.nlm.nih.gov/research/umls/mapping_projects/loinc_to_cpt_map.html | http://download.nlm.nih.gov/umls/kss/mappings/LNC215_TO_CPT2005/LNC215_TO_CPT2005_MAPPINGS.zip | | | LOINC| UMLS website | http://www.nlm.nih.gov/research/umls/mapping_projects/loinc_to_cpt_map.html | http://download.nlm.nih.gov/umls/kss/mappings/LNC215_TO_CPT2005/LNC215_TO_CPT2005_MAPPINGS.zip | | ||
| NDC-SPL| FDA website | http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm | http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/UCM070838.zip | | | NDC-SPL| FDA website | http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm | http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/UCM070838.zip | | ||
+ | | NDC-SPL| Dailymed website | http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm | ftp://public.nlm.nih.gov/nlmdata/.dailymed/dm_spl_release_human_rx_part1.zip | | ||
+ | | NDC-SPL| Dailymed website | http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-mapping-files.cfm | ftp://public.nlm.nih.gov/nlmdata/.dailymed/rxnorm_mappings.zip | | ||
| READ| UK HSCIC website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/1/subpack/21/releases. | select latest complete released file on the web page | | | READ| UK HSCIC website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/1/subpack/21/releases. | select latest complete released file on the web page | | ||
| READ| UK HSCIC website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/9/subpack/9/releases | select latest complete released file on the web page | | | READ| UK HSCIC website | https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/9/subpack/9/releases | select latest complete released file on the web page | | ||
+ | | DRG| CMS website | http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Acute-Inpatient-Files-for-Download.html | select latest complete released file on the web page | | ||
==== Licenses/registrations required for access to some source datasets ==== | ==== Licenses/registrations required for access to some source datasets ==== | ||
Line 108: | Line 112: | ||
* SNOMED | * SNOMED | ||
* UMLS | * UMLS | ||
+ | * DRG | ||
+ | * MeSH | ||
* working | * working | ||
* Final_Assembly | * Final_Assembly | ||
Line 232: | Line 238: | ||
- Extract der2_cRefset_AssociationReferenceFull_INT_YYYYMMDD.txt from SnomedCT_Release_INT_YYYYMMDD\RF2Release\Full\Refset\Content and der2_cRefset_AssociationReferenceFull_GB1000000_YYYYMMDD.txt from SnomedCT2_GB1000000_YYYYMMDD\RF2Release\Full\Refset\Content Remove date from file name. | - Extract der2_cRefset_AssociationReferenceFull_INT_YYYYMMDD.txt from SnomedCT_Release_INT_YYYYMMDD\RF2Release\Full\Refset\Content and der2_cRefset_AssociationReferenceFull_GB1000000_YYYYMMDD.txt from SnomedCT2_GB1000000_YYYYMMDD\RF2Release\Full\Refset\Content Remove date from file name. | ||
- Load them into der2_cRefset_AssRefFull_INT and der2_cRefset_AssRefFull_UK. (rename the data files to remove any YYYYMMDD suffix so they match the file names in the ctl files) | - Load them into der2_cRefset_AssRefFull_INT and der2_cRefset_AssRefFull_UK. (rename the data files to remove any YYYYMMDD suffix so they match the file names in the ctl files) | ||
+ | - Add DM+D: Download nhsbsa_dmd_X.X.X_xxxxxxxxxxxxxx.zip from https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/6/subpack/24/releases | ||
+ | - Extract f_ampp2_xxxxxxx.xml, f_amp2_xxxxxxx.xml, f_vmpp2_xxxxxxx.xml, f_vmp2_xxxxxxx.xml, f_lookup2_xxxxxxx.xml, f_vtm2_xxxxxxx.xml and f_ingredient2_xxxxxxx.xml | ||
+ | - Load them into f_ampp2, f_amp2, f_vmpp2, f_vmp2, f_lookup2, f_vtm2 and f_ingredient2. Use the control files of the same name. | ||
+ | - Download nhsbsa_dmdbonus_X.X.X_YYYYMMDDXXXXXX.zip from https://isd.hscic.gov.uk/trud3/user/authenticated/group/0/pack/6/subpack/25/releases | ||
+ | - Extract weekXXYYYY-rX_X-BNF.zip/f_bnf1_XXXXXXX.xml and rename him to dmdbonus.xml | ||
+ | - Load dmdbonus.xml using dmdbonus.ctl | ||
Staging table load statements | Staging table load statements | ||
Line 244: | Line 256: | ||
sqlldr PRODV5/<password> CONTROL=der2_cRefset_AssRefFull_INT.ctl LOG=der2_cRefset_AssRefFull_INT.log BAD=der2_cRefset_AssRefFull_INT.bad | sqlldr PRODV5/<password> CONTROL=der2_cRefset_AssRefFull_INT.ctl LOG=der2_cRefset_AssRefFull_INT.log BAD=der2_cRefset_AssRefFull_INT.bad | ||
sqlldr PRODV5/<password> CONTROL=der2_cRefset_AssRefFull_UK.ctl LOG=der2_cRefset_AssRefFull_UK.log BAD=der2_cRefset_AssRefFull_UK.bad | sqlldr PRODV5/<password> CONTROL=der2_cRefset_AssRefFull_UK.ctl LOG=der2_cRefset_AssRefFull_UK.log BAD=der2_cRefset_AssRefFull_UK.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_ampp2.ctl LOG=f_ampp2.log BAD=f_ampp2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_vmpp2.ctl LOG=f_vmpp2.log BAD=f_vmpp2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_amp2.ctl LOG=f_amp2.log BAD=f_amp2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_vmp2.ctl LOG=f_vmp2.log BAD=f_vmp2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_vtm2.ctl LOG=f_vtm2.log BAD=f_vtm2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_ingredient2.ctl LOG=f_ingredient2.log BAD=f_ingredient2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=f_lookup2.ctl LOG=f_lookup2.log BAD=f_lookup2.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=dmdbonus.ctl LOG=dmdbonus.log BAD=dmdbonus.bad | ||
</code> | </code> | ||
Line 420: | Line 440: | ||
- Extract loinc.csv, map_to.csv, source_organization.csv and LOINC_250_MULTI-AXIAL_HIERARCHY.CSV | - Extract loinc.csv, map_to.csv, source_organization.csv and LOINC_250_MULTI-AXIAL_HIERARCHY.CSV | ||
- Load them into LOINC, MAP_TO, SOURCE_ORGANIZATION and LOINC_HIERARCHY. Use the control files of the same name. | - Load them into LOINC, MAP_TO, SOURCE_ORGANIZATION and LOINC_HIERARCHY. Use the control files of the same name. | ||
- | - Load LOINC Answers - Load LOINC_XXX_SELECTED_FORMS.zip from http://loinc.org/downloads/files/loinc-panels-and-forms-file/loinc-panels-and-forms-file-all-selected-panels-and-forms/download | + | - Load LOINC Answers and LOINC Forms - Load LOINC_XXX_PanelsAndForms.zip from http://loinc.org/downloads/files/loinc-panels-and-forms-file/loinc-panels-and-forms-file-all-selected-panels-and-forms/download |
- | - Open LOINC_XXX_SELECTED_FORMS.xlsx and load worksheet "ANSWERS" to table LOINC_ANSWERS | + | - Open LOINC_XXX_PanelsAndForms.xlsx and load worksheet "ANSWERS" to table LOINC_ANSWERS (clear columns after DisplayText, save as Unicode text (UTF-8 w/o BOM) and use loinc_answers.ctl), worksheet "FORMS" to table LOINC_FORMS (clear columns after Loinc, save as Unicode text (UTF-8 w/o BOM) and use loinc_forms.ctl) |
- Open loinc_class.csv and load it into table loinc_class | - Open loinc_class.csv and load it into table loinc_class | ||
- Download SnomedCT_LOINC_TechnologyPreview_INT_xxxxxxxx.zip from https://loinc.org/news/draft-loinc-snomed-ct-mappings-and-expression-associations-now-available.html | - Download SnomedCT_LOINC_TechnologyPreview_INT_xxxxxxxx.zip from https://loinc.org/news/draft-loinc-snomed-ct-mappings-and-expression-associations-now-available.html | ||
Line 475: | Line 495: | ||
- Extract the product.txt file. | - Extract the product.txt file. | ||
- Load product.txt into PRODUCT using control file of the same name | - Load product.txt into PRODUCT using control file of the same name | ||
+ | - Download additional source for SPL concepts and relationships from http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm and http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-mapping-files.cfm | ||
+ | - -Full Releases of HUMAN PRESCRIPTION LABELS, HUMAN OTC LABELS, HOMEOPATHIC LABELS and REMAINDER LABELS (1st link) | ||
+ | - -SPL-RXNORM MAPPINGS (2d link) | ||
+ | - Extract LABELS using unzipxml.sh and load xml files using loadxml.ctl | ||
+ | - Extract rxnorm_mappings.zip and load rxnorm_mappings.txt using rxnorm_mappings.ctl | ||
- Run load_stage.sql | - Run load_stage.sql | ||
- Run generic_update.sql (from working directory) | - Run generic_update.sql (from working directory) | ||
Line 511: | Line 536: | ||
==== Step 11 ==== | ==== Step 11 ==== | ||
+ | |||
+ | === Overview === | ||
+ | |||
+ | Download, the DRG dataset. Decompress the gzipped files. Run the ETL script to create the staging tables schema and load the datasets. | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | - Run create_source_tables.sql | ||
+ | - Download "Files for FY" (Table 5 only) from http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Acute-Inpatient-Files-for-Download.html | ||
+ | - 2011: http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Downloads/FY_2011_FR_Table_5.zip | ||
+ | - 2012: http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Downloads/FY_12_NPRM_Table_5.zip | ||
+ | - 2013: http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Downloads/FY_13_FR_Table_5.zip | ||
+ | - 2014: http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Downloads/FY_14_FR_Table_5.zip | ||
+ | - 2015: http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Downloads/FY2015-NPRM-Table-5.zip | ||
+ | - Extract *.txt files from archives | ||
+ | - Rename them to FY2011.txt, FY2012.txt ... FY2015.txt and sequentially load using control files of the same names | ||
+ | - Run load_stage.sql | ||
+ | - generic_update.sql NOT needed | ||
+ | |||
+ | staging table load statements | ||
+ | |||
+ | <code> | ||
+ | sqlldr PRODV5/<password> CONTROL=FY2011.ctl LOG=FY2011.log BAD=FY2011.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=FY2012.ctl LOG=FY2012.log BAD=FY2012.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=FY2013.ctl LOG=FY2013.log BAD=FY2013.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=FY2014.ctl LOG=FY2014.log BAD=FY2014.bad | ||
+ | sqlldr PRODV5/<password> CONTROL=FY2015.ctl LOG=FY2015.log BAD=FY2015.bad | ||
+ | </code> | ||
+ | |||
+ | ==== Step 12 ==== | ||
+ | |||
+ | === Overview === | ||
+ | |||
+ | MeSH datasets contains in UMLS. | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | - Run load_stage.sql | ||
+ | - Run generic_update.sql (from working directory) | ||
+ | |||
+ | |||
+ | ==== Step 13 ==== | ||
+ | |||
+ | === Overview === | ||
+ | |||
+ | ICD10 datasets contains in UMLS. | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | - Run load_stage.sql | ||
+ | - Run generic_update.sql (from working directory) | ||
+ | |||
+ | |||
+ | |||
+ | ==== Step 14 ==== | ||
+ | |||
+ | === Overview === | ||
+ | |||
+ | GCN_SEQNO datasets contains in RxNorm. | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | - Run create_source_tables.sql | ||
+ | - Load NDDF_PRODUCT_INFO.TXT into NDDF_PRODUCT_INFO using control file of the same name | ||
+ | - Run load_stage.sql | ||
+ | - Run generic_update.sql (from working directory) | ||
+ | |||
+ | ==== Step 15 ==== | ||
+ | |||
+ | === Overview === | ||
+ | |||
+ | ETC datasets contains in "ETC sources.zip". | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | - Run create_source_tables.sql | ||
+ | - Unpack "ETC sources.zip" | ||
+ | - Load all TXT files using control files of the same name | ||
+ | - Run load_stage.sql | ||
+ | - Run generic_update.sql (from working directory) | ||
+ | |||
+ | |||
+ | ==== Step 16 ==== | ||
+ | |||
+ | === Overview === | ||
+ | |||
+ | Indication datasets contains in "Indication sources.zip". | ||
+ | |||
+ | === Details === | ||
+ | |||
+ | - Run create_source_tables.sql | ||
+ | - Unpack "Indication sources.zip" | ||
+ | - Load all TXT files using control files of the same name | ||
+ | - Run load_stage.sql | ||
+ | - Run generic_update.sql (from working directory) | ||
+ | |||
+ | ==== Step 17 ==== | ||
=== Overview === | === Overview === | ||
Line 521: | Line 643: | ||
- Run drug_strength.sql | - Run drug_strength.sql | ||
- | ==== Step 12 ==== | + | ==== Step 18 ==== |
=== Overview === | === Overview === | ||
Line 534: | Line 656: | ||
- Run manual_changes_15-Mar-2015.sql | - Run manual_changes_15-Mar-2015.sql | ||
- | ==== Step 13 ==== | + | ==== Step 19 ==== |
=== Overview === | === Overview === |