Hands-on Machine Learning for Cybersecurity by Soma Halder -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Hands-On Machine Learning for Cybersecurity

About Packt

Why subscribe? Packt.com

Contributors

About the authors About the reviewers Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Basics of Machine Learning in Cybersecurity

What is machine learning?

Problems that machine learning solves Why use machine learning in cybersecurity? Current cybersecurity solutions Data in machine learning

Structured versus unstructured data Labelled versus unlabelled data Machine learning phases Inconsistencies in data

Overfitting Underfitting

Different types of machine learning algorithm

Supervised learning algorithms Unsupervised learning algorithms Reinforcement learning Another categorization of machine learning Classification problems Clustering problems Regression problems Dimensionality reduction problems Density estimation problems Deep learning

Algorithms in machine learning

Support vector machines Bayesian networks Decision trees Random forests Hierarchical algorithms Genetic algorithms Similarity algorithms ANNs

The machine learning architecture

Data ingestion Data store The model engine

Data preparation Feature generation Training Testing

Performance tuning

Mean squared error Mean absolute error Precision, recall, and accuracy

How can model performance be improved?

Fetching the data to improve performance Switching machine learning algorithms Ensemble learning to improve performance

Hands-on machine learning

Python for machine learning Comparing Python 2.x with 3.x Python installation Python interactive development environment

Jupyter Notebook installation

Python packages

NumPy SciPy Scikit-learn pandas Matplotlib

Mongodb with Python

Installing MongoDB PyMongo

Setting up the development and testing environment

Use case Data Code

Summary

Time Series Analysis and Ensemble Modeling

What is a time series?

Time series analysis

Stationarity of a time series models Strictly stationary process Correlation in time series

Autocorrelation Partial autocorrelation function

Classes of time series models

Stochastic time series model Artificial neural network time series model Support vector time series models Time series components

Systematic models Non-systematic models

Time series decomposition

Level Trend Seasonality Noise

Use cases for time series

Signal processing Stock market predictions Weather forecasting Reconnaissance detection

Time series analysis in cybersecurity Time series trends and seasonal spikes

Detecting distributed denial of series with time series Dealing with the time element in time series Tackling the use case Importing packages

Importing data in pandas Data cleansing and transformation

Feature computation

Predicting DDoS attacks

ARMA ARIMA ARFIMA

Ensemble learning methods

Types of ensembling

Averaging Majority vote Weighted average

Types of ensemble algorithm

Bagging Boosting Stacking Bayesian parameter averaging Bayesian model combination Bucket of models

Cybersecurity with ensemble techniques

Voting ensemble method to detect cyber attacks Summary

Segregating Legitimate and Lousy URLs

Introduction to the types of abnormalities in URLs

URL blacklisting

Drive-by download URLs Command and control URLs Phishing URLs

Using heuristics to detect malicious pages

Data for the analysis Feature extraction

Lexical features

Web-content-based features Host-based features Site-popularity features

Using machine learning to detect malicious URLs Logistic regression to detect malicious URLs

Dataset Model

TF-IDF

SVM to detect malicious URLs Multiclass classification for URL classification

One-versus-rest

Summary

Knocking Down CAPTCHAs

Characteristics of CAPTCHA Using artificial intelligence to crack CAPTCHA

Types of CAPTCHA reCAPTCHA

No CAPTCHA reCAPTCHA

Breaking a CAPTCHA Solving CAPTCHAs with a neural network

Dataset Packages Theory of CNN Model

Code

Training the model Testing the model

Summary

Using Data Science to Catch Email Fraud and Spam

Email spoofing

Bogus offers Requests for help Types of spam emails

Deceptive emails CEO fraud Pharming Dropbox phishing Google Docs phishing

Spam detection

Types of mail servers Data collection from mail servers Using the Naive Bayes theorem to detect spam Laplace smoothing Featurization techniques that convert text-based emails into numeric values

Log-space TF-IDF N-grams Tokenization

Logistic regression spam filters

Logistic regression Dataset Python Results

Summary

Efficient Network Anomaly Detection Using k-means

Stages of a network attack

Phase 1 – Reconnaissance Phase 2 – Initial compromise Phase 3 – Command and control Phase 4 – Lateral movement Phase 5 – Target attainment Phase 6 – Ex-filtration, corruption, and disruption

Dealing with lateral movement in networks Using Windows event logs to detect network anomalies

Logon/Logoff events Account logon events Object access events Account management events

Active directory events

Ingesting active directory data Data parsing Modeling Detecting anomalies in a network with k-means

Network intrusion data

Coding the network intrusion attack Model evaluation

Sum of squared errors

Choosing k for k-means Normalizing features Manual verification

Summary

Decision Tree and Context-Based Malicious Event Detection

Adware Bots Bugs Ransomware Rootkit Spyware Trojan horses Viruses Worms Malicious data injection within databases Malicious injections in wireless sensors Use case

The dataset Importing packages Features of the data Model

Decision tree Types of decision trees

Categorical variable decision tree Continuous variable decision tree

Gini coeffiecient Random forest Anomaly detection

Isolation forest Supervised and outlier detection with Knowledge Discovery Databases (KDD)

Revisiting malicious URL detection with decision trees Summary

Catching Impersonators and Hackers Red Handed

Understanding impersonation Different types of impersonation fraud

Impersonators gathering information How an impersonation attack is constructed

Using data science to detect domains that are impersonations

Levenshtein distance

Finding domain similarity between malicious URLs Authorship attribution

AA detection for tweets

Difference between test and validation datasets

Sklearn pipeline

Naive Bayes classifier for multinomial models Identifying impersonation as a means of intrusion detection

Summary

Changing the Game with TensorFlow

Introduction to TensorFlow Installation of TensorFlow TensorFlow for Windows users Hello world in TensorFlow Importing the MNIST dataset Computation graphs

What is a computation graph?

Tensor processing unit Using TensorFlow for intrusion detection Summary

Financial Fraud and How Deep Learning Can Mitigate It

Machine learning to detect financial fraud

Imbalanced data Handling imbalanced datasets

Random under-sampling Random oversampling Cluster-based oversampling Synthetic minority oversampling technique Modified synthetic minority oversampling technique

Detecting credit card fraud

Logistic regression Loading the dataset Approach

Logistic regression classifier – under-sampled data

Tuning hyperparameters

Detailed classification reports Predictions on test sets and plotting a confusion matrix

Logistic regression classifier – skewed data Investigating precision-recall curve and area

Deep learning time

Adam gradient optimizer

Summary

Case Studies

Introduction to our password dataset

Text feature extraction Feature extraction with scikit-learn Using the cosine similarity to quantify bad passwords Putting it all together

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →