Sample data

scikit-learn includes several sample datasets in the sklearn.datasets submodule. At least two of these datasets, sklearn.datasets.load_breast_cancer and sklearn.datasets.load_diabetes, are healthcare-related. These datasets have been already preprocessed and are small in size, spanning only dozens of features and hundreds of patients. The data we will use in Chapter 7, Making Predictive Models in Healthcare is much bigger and resembles the data you are likely to receive from modern healthcare organizations. These sample sklearn datasets, however, are useful for experimenting with scikit-learn functions.