Review questions and exercises

  1. What is the difference between open data and proprietary databases?
  2. Is it enough for learners in the area of data science to use open data?
  3. Where can we access open public data?
  4. From The UCI Data Depository, http://archive.ics.uci.edu/ml/index.php, download a dataset called Wine. Write a program in R to import it.
  5. From the UCI Data Depository, download a dataset called Forest Fire. Write a program in Python to import it.
  6. From the UCI Data Depository, download a dataset called Bank Marketing. Write a program in Octave to import it. Answer the following questions: 1) How many banks? and 2) What is the cost?
  7. How can we find all R functions with read. as their leading letters? (Note that there is a dot after read.)
  8. How can we find more information on an R function called read.xls()?
  9. Explain the differences between two R functions: save() and saveRDS().
  10. Find more information about the read_clipboard() function included in the Python pandas package.

 

  1. What is the Quandl platform? What kinds of data could we download from Quandl?
  2. Write both R and Python programs to download GDP (Gross Domestic Product) data from the Quandl platform.
  3. When loading an R dataset, what is the difference between using the load() function and the readRDS() function?
  4. After uploading the Python pandas package, explain why we have the following error message:
  1. First, download a ZIP file called bank-fall.zip at http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. Unzip the file to get a CSV file; see the related code that follows:

Generate an R dataset called bank.Rata and bank.rds and answer the following questions: a) What is the average age? b) What percentage of people are married? c) Is the default probability of those who are married higher than those who are single?

  1. How do we merge two datasets in R?
  2. Write a Python program to download IBM's daily data from Quandl and merge it with Fama-French three-factor. To get a Fama-French daily factor time series we could go to http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html or download a dataset at http://canisius.edu/~yany/python/data/ffDaily.pkl.
  3. Generate both R and Python datasets for monthly Fama-French-Charhart four factors. Both time series, can be downloaded from Professor French's data library.
  4. Write a Python program to merge FRED/GDP data with market index data.