In the previous section, we used the strings package to count the number of links on a page. By using the regexp package, we can take this example further and retrieve the actual links with the following regular expression:
<a.*href\s*=\s*["'](http[s]{0,1}:\/\/.[^\s]*)["'].*>
This query should match any string that looks like a URL, inside an href attribute, inside of <a> tags.
The following program prints all of the links on the Packt Publishing home page. This same technique could be used to collect all of the images using querying for the src attributes of <img> tags:
package main
import (
"fmt"
"io/ioutil"
"net/http"
"regexp"
)
func main() {
resp, err := http.Get("https://www.packtpub.com/")
if err != nil {
panic(err)
}
data, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
stringBody := string(data)
re := regexp.MustCompile(`<a.*href\s*=\s*["'](http[s]{0,1}:\/\/.[^\s]*)["'].*>`)
linkMatches := re.FindAllStringSubmatch(stringBody, -1)
fmt.Printf("Found %d links:\n", len(linkMatches))
for _,linkGroup := range(linkMatches){
println(linkGroup[1])
}
}