- What is the difference between open data and proprietary databases?
- Is it enough for learners in the area of data science to use open data?
- Where can we access open public data?
- From The UCI Data Depository, http://archive.ics.uci.edu/ml/index.php, download a dataset called Wine. Write a program in R to import it.
- From the UCI Data Depository, download a dataset called Forest Fire. Write a program in Python to import it.
- From the UCI Data Depository, download a dataset called Bank Marketing. Write a program in Octave to import it. Answer the following questions: 1) How many banks? and 2) What is the cost?
- How can we find all R functions with read. as their leading letters? (Note that there is a dot after read.)
- How can we find more information on an R function called read.xls()?
- Explain the differences between two R functions: save() and saveRDS().
- Find more information about the read_clipboard() function included in the Python pandas package.
- What is the Quandl platform? What kinds of data could we download from Quandl?
- Write both R and Python programs to download GDP (Gross Domestic Product) data from the Quandl platform.
- When loading an R dataset, what is the difference between using the load() function and the readRDS() function?
- After uploading the Python pandas package, explain why we have the following error message:
![](assets/b5fb4549-6c1d-4469-92fb-c2a78eb33378.png)
- First, download a ZIP file called bank-fall.zip at http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. Unzip the file to get a CSV file; see the related code that follows:
![](assets/e347f686-a7b8-4809-874f-1babfb9ae8ae.png)
Generate an R dataset called bank.Rata and bank.rds and answer the following questions: a) What is the average age? b) What percentage of people are married? c) Is the default probability of those who are married higher than those who are single?
- How do we merge two datasets in R?
- Write a Python program to download IBM's daily data from Quandl and merge it with Fama-French three-factor. To get a Fama-French daily factor time series we could go to http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html or download a dataset at http://canisius.edu/~yany/python/data/ffDaily.pkl.
- Generate both R and Python datasets for monthly Fama-French-Charhart four factors. Both time series, can be downloaded from Professor French's data library.
- Write a Python program to merge FRED/GDP data with market index data.