Title Page Copyright and Credits Healthcare Analytics Made Simple Dedication Packt Upsell Why subscribe? PacktPub.com Foreword Contributors About the author About the reviewer Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Introduction to Healthcare Analytics What is healthcare analytics? Healthcare analytics uses advanced computing technology Healthcare analytics acts on the healthcare industry (DUH!) Healthcare analytics improves medical care Better outcomes Lower costs Ensure quality Foundations of healthcare analytics Healthcare Mathematics Computer science History of healthcare analytics Examples of healthcare analytics Using visualizations to elucidate patient care Predicting future diagnostic and treatment events Measuring provider quality and performance Patient-facing treatments for disease Exploring the software Anaconda Anaconda navigator Jupyter notebook Spyder IDE SQLite Command-line tools Installing a text editor Summary References Healthcare Foundations Healthcare delivery in the US Healthcare industry basics Healthcare financing Fee-for-service reimbursement Value-based care Healthcare policy Protecting patient privacy and patient rights Advancing the adoption of electronic medical records Promoting value-based care Advancing analytics in healthcare Patient data – the journey from patient to computer The history and physical (H&P) Metadata and chief complaint History of the present illness (HPI) Past medical history Medications Family history Social history Allergies Review of systems Physical examination Additional objective data (lab tests, imaging, and other diagnostic tests) Assessment and plan The progress (SOAP) clinical note Standardized clinical codesets International Classification of Disease (ICD) Current Procedural Terminology (CPT) Logical Observation Identifiers Names and Codes (LOINC) National Drug Code (NDC) Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) Breaking down healthcare analytics Population Medical task Screening Diagnosis Outcome/Prognosis Response to treatment Data format Structured Unstructured Imaging Other data format Disease Acute versus chronic diseases Cancer Other diseases Putting it all together – specifying a use case Summary References and further reading Machine Learning Foundations Model frameworks for medical decision making Tree-like reasoning Categorical reasoning with algorithms and trees Corresponding machine learning algorithms – decision tree and random forest Probabilistic reasoning and Bayes theorem Using Bayes theorem for calculating clinical probabilities Calculating the baseline MI probability 2 x 2 contingency table for chest pain and myocardial infarction Interpreting the contingency table and calculating sensitivity and specificity Calculating likelihood ratios for chest pain (+ and -) Calculating the post-test probability of MI given the presence of chest pain Corresponding machine learning algorithm – the Naive Bayes Classifier Criterion tables and the weighted sum approach Criterion tables Corresponding machine learning algorithms – linear and logistic regression Pattern association and neural networks Complex clinical reasoning Corresponding machine learning algorithm – neural networks and deep learning Machine learning pipeline Loading the data Cleaning and preprocessing the data Aggregating data Parsing data Converting types Dealing with missing data Exploring and visualizing the data Selecting features Training the model parameters Evaluating model performance Sensitivity (Sn) Specificity (Sp) Positive predictive value (PPV) Negative predictive value (NPV) False-positive rate (FPR) Accuracy (Acc) Receiver operating characteristic (ROC) curves Precision-recall curves Continuously valued target variables Summary References and further reading Computing Foundations – Databases Introduction to databases Data engineering with SQL – an example case Case details – predicting mortality for a cardiology practice The clinical database The PATIENT table The VISIT table The MEDICATIONS table The LABS table The VITALS table The MORT table Starting an SQLite session Data engineering, one table at a time with SQL Query Set #0 – creating the six tables Query Set #0a – creating the PATIENT table Query Set #0b – creating the VISIT table Query Set #0c – creating the MEDICATIONS table Query Set #0d – creating the LABS table Query Set #0e – creating the VITALS table Query Set #0f – creating the MORT table Query Set #0g – displaying our tables Query Set #1 – creating the MORT_FINAL table Query Set #2 – adding columns to MORT_FINAL Query Set #2a – adding columns using ALTER TABLE Query Set #2b – adding columns using JOIN Query Set #3 – date manipulation – calculating age Query Set #4 – binning and aggregating diagnoses Query Set #4a – binning diagnoses for CHF Query Set #4b – binning diagnoses for other diseases Query Set #4c – aggregating cardiac diagnoses using SUM Query Set #4d – aggregating cardiac diagnoses using COUNT Query Set #5 – counting medications Query Set #6 – binning abnormal lab results Query Set #7 – imputing missing variables Query Set #7a – imputing missing temperature values using normal-range imputation Query Set #7b – imputing missing temperature values using mean imputation Query Set #7c – imputing missing BNP values using a uniform distribution Query Set #8 – adding the target variable Query Set #9 – visualizing the MORT_FINAL_2 table Summary References and further reading Computing Foundations – Introduction to Python Variables and types Strings Numeric types Data structures and containers Lists Tuples Dictionaries Sets Programming in Python – an illustrative example Introduction to pandas What is a pandas DataFrame? Importing data Importing data into pandas from Python data structures Importing data into pandas from a flat file Importing data into pandas from a database Common operations on DataFrames Adding columns Adding blank or user-initialized columns Adding new columns by transforming existing columns Dropping columns Applying functions to multiple columns Combining DataFrames Converting DataFrame columns to lists Getting and setting DataFrame values Getting/setting values using label-based indexing with loc Getting/setting values using integer-based labeling with iloc Getting/setting multiple contiguous values using slicing Fast getting/setting of scalar values using at and iat Other operations Filtering rows using Boolean indexing Sorting rows SQL-like operations Getting aggregate row COUNTs Joining DataFrames Introduction to scikit-learn Sample data Data preprocessing One-hot encoding of categorical variables Scaling and centering Binarization Imputation Feature-selection Machine learning algorithms Generalized linear models Ensemble methods Additional machine learning algorithms Performance assessment Additional analytics libraries NumPy and SciPy matplotlib Summary Measuring Healthcare Quality Introduction to healthcare measures US Medicare value-based programs The Hospital Value-Based Purchasing (HVBP) program Domains and measures The clinical care domain The patient- and caregiver-centered experience of care domain Safety domain Efficiency and cost reduction domain The Hospital Readmission Reduction (HRR) program The Hospital-Acquired Conditions (HAC) program The healthcare-acquired infections domain The patient safety domain The End-Stage Renal Disease (ESRD) quality incentive program The Skilled Nursing Facility Value-Based Program (SNFVBP) The Home Health Value-Based Program (HHVBP) The Merit-Based Incentive Payment System (MIPS) Quality Advancing care information Improvement activities Cost Other value-based programs The Healthcare Effectiveness Data and Information Set (HEDIS) State measures Comparing dialysis facilities using Python Downloading the data Importing the data into your Jupyter Notebook session Exploring the data rows and columns Exploring the data geographically Displaying dialysis centers based on total performance Alternative analyses of dialysis centers Comparing hospitals Downloading the data Importing the data into your Jupyter Notebook session Exploring the tables Merging the HVBP tables Summary References Making Predictive Models in Healthcare Introduction to predictive analytics in healthcare Our modeling task – predicting discharge statuses for ED patients Obtaining the dataset The NHAMCS dataset at a glance Downloading the NHAMCS data Downloading the ED2013 file Downloading the list of survey items – body_namcsopd.pdf Downloading the documentation file – doc13_ed.pdf Starting a Jupyter session Importing the dataset Loading the metadata Loading the ED dataset Making the response variable Splitting the data into train and test sets Preprocessing the predictor variables Visit information Month Day of the week Arrival time Wait time Other visit information Demographic variables Age Sex Ethnicity and race Other demographic information Triage variables Financial variables Vital signs Temperature Pulse Respiratory rate Blood pressure Oxygen saturation Pain level Reason-for-visit codes Injury codes Diagnostic codes Medical history Tests Procedures Medication codes Provider information Disposition information Imputed columns Identifying variables Electronic medical record status columns Detailed medication information Miscellaneous information Final preprocessing steps One-hot encoding Numeric conversion NumPy array conversion Building the models Logistic regression Random forest Neural network Using the models to make predictions Improving our models Summary References and further reading Healthcare Predictive Models – A Review Predictive healthcare analytics – state of the art Overall cardiovascular risk The Framingham Risk Score Cardiovascular risk and machine learning Congestive heart failure Diagnosing CHF CHF detection with machine learning Other applications of machine learning in CHF Cancer What is cancer? ML applications for cancer Important features of cancer Routine clinical data Cancer-specific clinical data Imaging data Genomic data Proteomic data An example – breast cancer prediction Traditional screening of breast cancer Breast cancer screening and machine learning Readmission prediction LACE and HOSPITAL scores Readmission modeling Other conditions and events Summary References and further reading The Future – Healthcare and Emerging Technologies Healthcare analytics and the internet Healthcare and the Internet of Things Healthcare analytics and social media Influenza surveillance and forecasting Predicting suicidality with machine learning Healthcare and deep learning What is deep learning, briefly? Deep learning in healthcare Deep feed-forward networks Convolutional neural networks for images Recurrent neural networks for sequences Obstacles, ethical issues, and limitations Obstacles Ethical issues Limitations Conclusion of this book References and further reading Other Books You May Enjoy Leave a review - let other readers know what you think