Mining the Social Web by Russell, Matthew -- Read -- Imperial Library of Trantor

Index

Mining the Social Web SPECIAL OFFER: Upgrade this ebook with O’Reilly Preface

Content Updates

February 22, 2012

To Read This Book? Or Not to Read This Book? Tools and Prerequisites Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us Acknowledgments

1. Introduction: Hacking on Twitter Data

Installing Python Development Tools Collecting and Manipulating Twitter Data

Tinkering with Twitter’s API Frequency Analysis and Lexical Diversity

What are people talking about right now? Extracting relationships from the tweets

Visualizing Tweet Graphs Synthesis: Visualizing Retweets with Protovis

Closing Remarks

2. Microformats: Semantic Markup and Common Sense Collide

XFN and Friends Exploring Social Connections with XFN

A Breadth-First Crawl of XFN Data

Brief analysis of breadth-first techniques

Geocoordinates: A Common Thread for Just About Anything

Wikipedia Articles + Google Maps = Road Trip?

Plotting geo data via microform.at and Google Maps

Slicing and Dicing Recipes (for the Health of It) Collecting Restaurant Reviews Summary

3. Mailboxes: Oldies but Goodies

mbox: The Quick and Dirty on Unix Mailboxes mbox + CouchDB = Relaxed Email Analysis

Bulk Loading Documents into CouchDB Sensible Sorting Map/Reduce-Inspired Frequency Analysis

Frequency by date/time range Frequency by sender/recipient fields

Sorting Documents by Value couchdb-lucene: Full-Text Indexing and More

Threading Together Conversations

Look Who’s Talking

Visualizing Mail “Events” with SIMILE Timeline Analyzing Your Own Mail Data

The Graph Your (Gmail) Inbox Chrome Extension

Closing Remarks

4. Twitter: Friends, Followers, and Setwise Operations

RESTful and OAuth-Cladded APIs

No, You Can’t Have My Password

A Lean, Mean Data-Collecting Machine

A Very Brief Refactor Interlude Redis: A Data Structures Server Elementary Set Operations Souping Up the Machine with Basic Friend/Follower Metrics Calculating Similarity by Computing Common Friends and Followers Measuring Influence

Constructing Friendship Graphs

Clique Detection and Analysis The Infochimps “Strong Links” API Interactive 3D Graph Visualization

Summary

5. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet

Pen : Sword :: Tweet : Machine Gun (?!?) Analyzing Tweets (One Entity at a Time)

Tapping (Tim’s) Tweets

What entities are in Tim’s tweets? Do frequently appearing user entities imply friendship? Splicing in the other half of the conversation

Who Does Tim Retweet Most Often? What’s Tim’s Influence? How Many of Tim’s Tweets Contain Hashtags?

Juxtaposing Latent Social Networks (or #JustinBieber Versus #TeaParty)

What Entities Co-Occur Most Often with #JustinBieber and #TeaParty Tweets? On Average, Do #JustinBieber or #TeaParty Tweets Have More Hashtags? Which Gets Retweeted More Often: #JustinBieber or #TeaParty? How Much Overlap Exists Between the Entities of #TeaParty and #JustinBieber Tweets?

Visualizing Tons of Tweets

Visualizing Tweets with Tricked-Out Tag Clouds Visualizing Community Structures in Twitter Search Results

Closing Remarks

6. LinkedIn: Clustering Your Professional Network for Fun (and Profit?)

Motivation for Clustering Clustering Contacts by Job Title

Standardizing and Counting Job Titles Common Similarity Metrics for Clustering A Greedy Approach to Clustering

Scalable clustering sure ain’t easy Intelligent clustering enables compelling user experiences

Hierarchical and k-Means Clustering

Hierarchical clustering k-means clustering

Fetching Extended Profile Information Geographically Clustering Your Network

Mapping Your Professional Network with Google Earth Mapping Your Professional Network with Dorling Cartograms

Closing Remarks

7. Google+: TF-IDF, Cosine Similarity, and Collocations

Harvesting Google+ Data Data Hacking with NLTK Text Mining Fundamentals

A Whiz-Bang Introduction to TF-IDF Querying Google+ Data with TF-IDF

Finding Similar Documents

The Theory Behind Vector Space Models and Cosine Similarity Clustering Posts with Cosine Similarity Visualizing Similarity with Graph Visualizations

Bigram Analysis

How the Collocation Sausage Is Made: Contingency Tables and Scoring Functions

Tapping into Your Gmail

Accessing Gmail with OAuth Fetching and Parsing Email Messages

Before You Go Off and Try to Build a Search Engine… Closing Remarks

8. Blogs et al.: Natural Language Processing (and Beyond)

NLP: A Pareto-Like Introduction

Syntax and Semantics A Brief Thought Exercise

A Typical NLP Pipeline with NLTK Sentence Detection in Blogs with NLTK Summarizing Documents

Analysis of Luhn’s Summarization Algorithm

Entity-Centric Analysis: A Deeper Understanding of the Data

Quality of Analytics

Closing Remarks

9. Facebook: The All-in-One Wonder

Tapping into Your Social Network Data

From Zero to Access Token in Under 10 Minutes Facebook’s Query APIs

Exploring the Graph API one connection at a time Slicing and dicing data with FQL

Visualizing Facebook Data

Visualizing Your Entire Social Network

Visualizing with RGraphs Visualizing with a Sunburst Visualizing with spreadsheets (the old-fashioned way)

Visualizing Mutual Friendships Within Groups Where Have My Friends All Gone? (A Data-Driven Game) Visualizing Wall Data As a (Rotating) Tag Cloud

Closing Remarks

10. The Semantic Web: A Cocktail Discussion

An Evolutionary Revolution? Man Cannot Live on Facts Alone

Open-World Versus Closed-World Assumptions Inferencing About an Open World with FuXi

Hope

Index About the Author Colophon SPECIAL OFFER: Upgrade this ebook with O’Reilly

← Prev
Back
Next →

← Prev
Back
Next →