Summary

To summarize our findings, both PCA and LDA are feature transformation tools in our arsenal that are used to find optimal new features to use. LDA specifically optimizes for class separation while PCA works in an unsupervised way to capture variance in the data in fewer columns. Usually, the two are used in conjunction with supervised pipelines, as we showed in the iris pipeline. In the final chapter, we will go through two longer case studies that utilize both PCA and LDA for text clustering and facial recognition software.

PCA and LDA are extremely powerful tools, but have limitations. Both of them are linear transformations, which means that they can only create linear boundaries and capture linear qualities in our data. They are also static transformations. No matter what data we input into a PCA or LDA, the output is expected and mathematical. If the data we are using isn't a good fit for PCA or LDA (they exhibit non-linear qualities, for example, they are circular), then the two algorithms will not help us, no matter how much we grid search.

The next chapter will focus on feature learning algorithms. These are arguably the most powerful feature engineering algorithms. They are built to learn new features based on the input data without assuming qualities such as PCA and LDA. In this chapter, we will use complex structures including neural networks to achieve the highest level of feature engineering yet.