Title Page Copyright and Credits Go Web Scraping Quick Start Guide About Packt Why subscribe? Packt.com Contributors About the author About the reviewer Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Conventions used Get in touch Reviews Introducing Web Scraping and Go What is web scraping? Why do you need a web scraper? Search engines Price comparison Building datasets What is Go? Why is Go a good fit for web scraping? Go is fast Go is safe Go is simple How to set up a Go development environment Go language and tools Git Editor Summary The Request/Response Cycle What do HTTP requests look like? HTTP request methods HTTP headers Query parameters Request body What do HTTP responses look like? Status line Response headers Response body What are HTTP status codes? 100–199 range 200–299 range 300–399 range 400–499 range 500–599 range What do HTTP requests/responses look like in Go? A simple request example Summary Web Scraping Etiquette What is a robots.txt file? What is a User-Agent string? Example How to throttle your scraper How to use caching Cache-Control Expires Etag Caching content in Go Summary Parsing HTML What is the HTML format? Syntax Structure Searching using the strings package Example – Counting links Example – Doctype check Searching using the regexp package Example – Finding links Example – Finding prices Searching using XPath queries Example – Daily deals Example – Collecting products Searching using Cascading Style Sheets selectors Example – Daily deals Example – Collecting products Summary Web Scraping Navigation Following links Example – Daily deals Submitting forms Example – Submitting searches Example – POST method Avoiding loops Breadth-first versus depth-first crawling Depth-first Breadth-first Navigating with JavaScript Example – Book reviews Summary Protecting Your Web Scraper Virtual private servers Proxies Public and shared proxies Dedicated proxies Price Location Type Anonymity Proxies in Go Virtual private networks Boundaries Whitelists Blacklists Summary Scraping with Concurrency What is concurrency Concurrency pitfalls Race conditions Deadlocks The Go concurrency model Goroutines Channels sync package helpers Conditions Atomic counters Summary Scraping at 100x Components of a web scraping system Queue Cache Storage Logs Scraping HTML pages with colly Scraping JavaScript pages with chrome-protocol Example – Amazon Daily Deals Distributed scraping with dataflowkit The Fetch service The Parse service Summary Other Books You May Enjoy Leave a review - let other readers know what you think