Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover Page
Title
Copyright
Contents
Preface: Data Cleaning Pocket Primer
What Is the Goal?
Is This Book is for Me and What Will I Learn?
How Were the Code Samples Created?
What You Need to Know for This Book
Which bash Commands are Excluded?
How Do I Set Up a Command Shell?
What Are the “Next Steps” after Finishing This Book?
Chapter 1: Introduction
What Is Unix?
Available Shell Types
What Is bash?
Getting Help for bash Commands
Navigating Around Directories
The history Command
Listing Filenames with the ls Command
Displaying Contents of Files
The cat Command
The head and tail Commands
The Pipe Symbol
The fold Command
File Ownership: Owner, Group, and World
Hidden Files
Handling Problematic Filenames
Working with Environment Variables
The env Command
Useful Environment Variables
Setting the PATH Environment Variable
Specifying Aliases and Environment Variables
Finding Executable Files
What Are Shell Scripts?
A Simple Shell Script
Using a Semicolon to Separate Commands
The printf Command and the echo Command
The echo Command and Whitespaces
Command Substitution (“back tick”)
Setting Environment Variables via Shell Scripts
Sourcing or “Dotting” a Shell Script
Working with Arrays
Working with Nested Loops
The paste Command
Inserting Blank Lines with the paste Command
The cut Command
Working with Metacharacters
Working with Character Classes
The “pipe” Symbol and Multiple Commands
A Simple Use Case
Another Simple Use Case
Summary
Chapter 2: Useful Commands
The join Command
The fold Command
The split Command
The sort Command
The uniq Command
How to Compare Files
The od Command
The tr Command
A Simple Use Case
The find Command
The tee Command
File Compression Commands
The tar command
The cpio Command
The gzip and gunzip Commands
The bunzip2 Command
The zip Command
Commands for zip Files and bz Files
Internal Field Separator (IFS)
Data from a Range of Columns in a Dataset
Working with Uneven Rows in Datasets
Working with Functions in Shell Scripts
Recursion and Shell Scripts
Iterative Solutions for Factorial Values
Summary
Chapter 3: Filtering Data with grep
What Is the grep Command?
Metacharacters and the grep Command
Escaping Metacharacters with the grep Command
Useful Options for the grep Command
Character Classes and the grep Command
Working with the –c Option in grep
Matching a Range of Lines
Using Back References in the grep Command
Finding Empty Lines in Datasets
Using Keys to Search Datasets
The Backslash Character and the grep Command
Multiple Matches in the grep Command
The grep Command and the xargs Command
Searching zip Files for a String
Checking for a Unique Key Value
Redirecting Error Messages
The egrep Command and the fgrep Command
Displaying “Pure” Words in a Dataset with egrep
The fgrep Command
A Simple Use Case
Summary
Chapter 4: Transforming Data with sed
What Is the sed Command?
The sed Execution Cycle
Matching String Patterns Using sed
Substituting String Patterns Using sed
Replacing Vowels from a String or a File
Deleting Multiple Digits and Letters from a String
Search and Replace with sed
Datasets with Multiple Delimiters
Useful Switches in sed
Working with Datasets
Printing Lines
Character Classes and sed
Removing Control Characters
Counting Words in a Dataset
Back References in sed
Displaying Only “Pure” Words in a Dataset
One-Line sed Commands
Summary
Chapter 5: Doing Everything Else with awk
The awk Command
Built-in Variables That Control awk
How Does the awk Command Work?
Aligning Text with the printf Command
Conditional Logic and Control Statements
The while Statement
A for loop in awk
A for loop with a break Statement
The next and continue Statements
Deleting Alternate Lines in Datasets
Merging Lines in Datasets
Printing File Contents as a Single Line
Joining Groups of Lines in a Text File
Joining Alternate Lines in a Text File
Matching with Metacharacters and Character Sets
Printing Lines Using Conditional Logic
Splitting Filenames with awk
Working with Postfix Arithmetic Operators
Numeric Functions in awk
One-Line awk Commands
Useful Short awk Scripts
Printing the Words in a Text String in awk
Count Occurrences of a String in Specific Rows
Printing a String in a Fixed Number of Columns
Printing a Dataset in a Fixed Number of Columns
Aligning Columns in Datasets
Aligning Columns and Multiple Rows in Datasets
Removing a Column from a Text File
Subsets of Columns of Even Rows in Datasets
Counting Word Frequency in Datasets
Displaying Only “Pure” Words in a Dataset
Working with Multiline Records in awk
A Simple Use Case
Another Use Case
Summary
Appendix: Other Code Samples
Examples for Chapter 1
Examples for Chapter 2
Calculating Fibonacci Numbers
Calculating the GCD of Two Positive Integers
Calculating the LCM of Two Positive Integers
Calculating Prime Divisors
Examples for Chapter 3
Simulating Relational Data with the grep Command
Checking Updates in a Logfile
Examples for Chapter 4
Examples for Chapter 5
Processing Multiline Records
Adding the Contents of Records
Using the split Function in awk
Scanning Diagonal Elements in Datasets
Adding Values from Multiple Datasets (1)
Adding Values from Multiple Datasets (2)
Adding Values from Multiple Datasets (3)
Calculating Combinations of Field Values
Summary
Index
← Prev
Back
Next →
← Prev
Back
Next →