Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Neo4j in Action
Aleksa Vukotic and Nicki Watt with Tareq Abedrabbo, Dominic Fox, and Jonas Partner
Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this Book
Roadmap
Code conventions and downloads
Author Online forum
About the Authors
About the Cover Illustration
Part 1. Introduction to Neo4j
Chapter 1. A case for a Neo4j database
1.1. Why Neo4j?
Figure 1.1. Users and their friends represented as a graph data structure
Note
1.2. Graph data in a relational database
Figure 1.2. SQL diagram of tables representing user and friend data
Listing 1.1. SQL script defining tables for social network data
1.2.1. Querying graph data using MySQL
Note
Table 1.1. Execution times for multiple join queries using a MySQL database engine on a data set of 1,000 users
Note
Note
1.3. Graph data in Neo4j
1.3.1. Traversing the graph
Listing 1.2. Neo4j Traversal API code for finding all friends at depth 2
Figure 1.3. Traversing the social network graph data
Table 1.2. The execution times for graph traversal using Neo4j on a data set of 1,000 users
Note
Note
1.4. SQL joins versus graph traversal on a large scale
Table 1.3. The execution times for multiple join queries using a MySQL database engine on a data set of 1 million users
Note
Table 1.4. The execution times for graph traversal using Neo4j on a data set of 1 million users
1.5. Graphs around you
1.6. Neo4j in NoSQL space
Note
1.6.1. Key-value stores
1.6.2. Column-family stores
1.6.3. Document-oriented databases
1.6.4. Graph databases
1.6.5. NoSQL categories compared
Table 1.5. An overview of NoSQL categories
1.7. Neo4j: the ACID-compliant database
Note
1.8. Summary
Chapter 2. Data modeling in Neo4j
2.1. What is a data model for Neo4j?
2.1.1. Modeling with diagrams: a simple example
Figure 2.1. Users and groups in an RDBMS with a join table
Figure 2.2. A person belongs to a group.
Figure 2.3. Three people in two groups
2.1.2. Modeling with diagrams: a complex example
Figure 2.4. The entities in an access control system
Figure 2.5. Relationships between entities in the access control system
Figure 2.6. The access control model extended to support subgroups
2.2. Domain modeling
2.2.1. Entities and properties
Figure 2.7. A User with some properties
Figure 2.8. The user with some nodes converted to properties
Figure 2.9. Two users with different relationships to the same address
Figure 2.10. Two users with differently qualified relationships to the same address
Figure 2.11. Reifying the relationship between a User and an Address in order to introduce other participants
Figure 2.12. A User with additional properties stored in a related UserEx node
2.3. Further examples
2.3.1. Underground stations example
Figure 2.13. Two underground stations, with the connection between them reified as a single Track Section
Figure 2.14. Two underground stations, with connections on two different lines
2.3.2. Band members example
Figure 2.15. A simple model showing band members and a recorded album
Figure 2.16. A more complex model showing two different lineups of the same band
2.4. Summary
Chapter 3. Starting development with Neo4j
3.1. Modeling graph data structures
Figure 3.1. Users in a social network represented as boxes
Figure 3.2. Simple social network graph with users connected as friends
Note
Figure 3.3. Richer model with a name property assigned to each user
Figure 3.4. Introducing movies to the model
Figure 3.5. Introducing type properties to differentiate between User and Movie elements
Figure 3.6. Complete model of a movie-lover’s social network
3.2. Using the Neo4j API
3.2.1. Creating nodes
Figure 3.7. Graph with only node entities
Listing 3.1. Creating a single user node in Neo4j (Java 6/Neo4j 1.9.X style)
Listing 3.2. Creating single user node in Neo4j (Java 7/Neo4j 2.0.X style)
Listing 3.3. Creating multiple nodes in a single transaction
3.2.2. Creating relationships
Figure 3.8. Simple social network graph with users connected as friends
Note
Listing 3.4. Creating relationships between nodes using the Neo4j Core Java API
3.2.3. Adding properties to nodes
Listing 3.5. Adding name property to user nodes
Note
Table 3.1. The property types in Neo4j
Note
Listing 3.6. Adding different property types to nodes
Figure 3.9. Rich property graph representing the social network
Note
3.2.4. Node type strategies
Listing 3.7. Creating movie nodes using Neo4j Core Java API
Figure 3.10. Graph with nodes representing users and movies
Listing 3.8. Adding a type property to determine node types
Note
Figure 3.11. Nodes using the type property strategy
3.2.5. Adding properties to relationships
Listing 3.9. Creating relationships with properties
Figure 3.12. Complete model of a movie-lovers’ social network
3.3. Node labels
Note
Listing 3.10. Adding labels to nodes
Figure 3.13. Movie nodes grouped using the label MOVIE
Note
Listing 3.11. Finding nodes with a given label and property
Note
3.4. Summary
Chapter 4. The power of traversals
4.1. Traversing using the Neo4j Core Java API
Figure 4.1. The selected user and the movies he’s seen are marked with a bold border.
4.1.1. Finding the starting node
Note
4.1.2. Traversing direct relationships
Listing 4.1. Filtering movies by iterating through all relationships from the node
Note
Note
Note
Listing 4.2. Filtering movies using the Neo4j Core Java API filtering capabilities
4.1.3. Traversing second-level relationships
Figure 4.2. Nodes and relationships to follow to find movies that John’s friends like
Listing 4.3. Finding movies that have been seen by John’s friends
Listing 4.4. Finding movies that have been seen by John’s friends but not by John
4.1.4. Memory usage considerations
Listing 4.5. Using iterables to lower Java heap memory consumption
Note
4.2. Traversing using the Neo4j Traversal API
4.2.1. Using Neo4j’s built-in traversal constructs
Listing 4.6. Using the Neo4j Traversal API to find movies seen by friends
4.2.2. Implementing a custom evaluator
Table 4.1. The methods defined on the org.neo4j.graphdb.Path interface
Note
Table 4.2. The possible values of the Evaluation enumeration
Listing 4.7. Custom evaluator to exclude movies that the user has seen
Listing 4.8. Improved traversal definition with a custom evaluator
4.3. Summary
Chapter 5. Indexing the data
5.1. Creating the index entry
Figure 5.1. Index pointing to user nodes as values, using the email property as a key
Listing 5.1. Creating an index entry for a node using Neo4j API
5.2. Finding the user by their email
Figure 5.2. Looking up a user node from the index by using the email property
Listing 5.2. Finding a single user by index lookup using the email property
5.3. Dealing with more than one match
Figure 5.3. User nodes indexed by the age property, with each key potentially referencing multiple nodes
Listing 5.3. Iterating through multiple results of an index lookup operation
Note
Note
5.4. Dealing with changes to indexed data
Listing 5.4. Updating the index using sequential remove and add operations
5.5. Automatic indexing
5.5.1. Schema indexing
Listing 5.5. Using schema indexes with Java API
Listing 5.6. Updating multiple schema indexes
5.5.2. Auto-indexing
Configuring auto-indexing in standalone mode
Configuring auto-indexing in embedded mode
Using an automatically created index
5.6. The cost/benefit trade-off of indexing
Figure 5.4. Graph of a social network using intermediate nodes to differentiate between relationships to user and film nodes.
5.6.1. Performance benefit of indexing when querying
Figure 5.5. Performance of node lookup using index compared to iterating through all nodes
5.6.2. Performance overhead of indexing when updating and inserting
Figure 5.6. Average time for storing user node with and without indexing
5.6.3. Storing the index
5.7. Summary
Part 2. Application Development with Neo4j
Chapter 6. Cypher: Neo4j query language
6.1. Introduction to Cypher
6.1.1. Cypher primer
Figure 6.1. A social network graph to be queried
Listing 6.1. Traversing the graph using Java API to find all movies the user has seen
Note
6.1.2. Executing Cypher queries
Table 6.1. Tools and techniques for executing Cypher queries
Executing Cypher using the Neo4j Shell
Table 6.2. Neo4j Shell startup script syntax for Linux and Windows environments
Figure 6.2. The Neo4j Shell ready to accept commands
Note
Figure 6.3. Cypher is executed in the Neo4j Shell natively, resulting in tabular output.
Executing Cypher using the Web Admin Console
Figure 6.4. Homepage of the Neo4j Web Admin Console in the browser
Figure 6.5. Executing a Cypher query inside the Web Admin Console
Note
Executing Cypher from Java code
6.2. Cypher syntax basics
6.2.1. Pattern matching
Using node and relationship identifiers
Note
Complex Pattern Matching
Note
Note
Figure 6.6. Result of the query execution in Neo4j Shell, finding movie recommendations for the user based on the movies their friends have seen
Note
6.2.2. Finding the starting node
Node lookup by ID
Loading multiple nodes by IDs
Note
Using an index to look up the starting node(s)
Note
Note
Using a schema-based index to look up the starting node(s)
Note
Multiple start nodes in Cypher
6.2.3. Filtering data
6.2.4. Getting the results
Note
Returning properties
Note
Returning relationships
Returning paths
Note
Paging results
Note
6.3. Updating your graph with Cypher
Note
6.3.1. Creating new graph entities
6.3.2. Deleting data
6.3.3. Updating node and relationship properties
6.4. Advanced Cypher
6.4.1. Aggregation
Note
6.4.2. Functions
Note
6.4.3. Piping using the with clause
Note
6.4.4. Cypher compatibility
Note
6.5. Summary
Chapter 7. Transactions
7.1. Transaction basics
Listing 7.1. Attempting to update without a transaction
Note
7.1.1. Adding in a transaction
Listing 7.2. Attempting to update with a transaction
7.1.2. Finishing what you start and not trying to do too much in one go
Listing 7.3. A really big transaction can run out of memory
7.2. Transactions in depth
7.2.1. Transaction semantics
Durability
Isolation levels and Neo4j Locks
Figure 7.1. Default isolation level
7.2.2. Reading in a transaction and explicit read locks
Listing 7.4. Reading the same thing twice without a transaction
Listing 7.5. Reading the same thing twice with increased isolation
Figure 7.2. Explicit read locks
7.2.3. Writing in a transaction and explicit write locks
Listing 7.6. Acquiring write locks explicitly
7.2.4. The danger of deadlocks
7.3. Integration with other transaction management systems
Listing 7.7. Configuring Spring transaction manager
Listing 7.8. Declarative transaction management
7.4. Transaction events
Listing 7.9. Transaction event handlers
7.5. Summary
Chapter 8. Traversals in depth
8.1. Traversal ordering
Figure 8.1. Simple graph with nine nodes and eight relationships
8.1.1. Depth-first
Figure 8.2. Walking the graph using depth-first ordering
Listing 8.1. Walking the entire graph depth-first using the Neo4j Traversal API
Note
8.1.2. Breadth-first
Figure 8.3. Breadthfirst traversal of the sample graph
Listing 8.2. Breadth-first traversal using the Neo4j Traversal API
8.1.3. Comparing depth-first and breadth-first ordering
Note
Table 8.1. The performance of a traversal depending on the location of node searched for and the traversal algorithm used
8.2. Expanding relationships
8.2.1. StandardExpander
Note
Figure 8.4. A social network of users and movies they like
Listing 8.3. Finding all movies John’s friends and colleagues like
8.2.2. Ordering relationships for expansion
Listing 8.4. Expanding relationships in the order of relationship types
8.2.3. Custom expanders
Listing 8.5. Expanding relationships based on distance from the starting node
Listing 8.6. Finding all movies John’s friends like using a custom expander
8.3. Managing uniqueness
8.3.1. NODE_GLOBAL uniqueness
Note
Figure 8.5. A simple social network graph
Listing 8.7. Finding Jane’s direct connections who can introduce her to Ben
8.3.2. NODE_PATH uniqueness
Listing 8.8. Making connections using NODE_PATH uniqueness
8.3.3. Other uniqueness types
8.4. Bidirectional traversals
Figure 8.6. Bidirectional traversal used to find the path between two nodes
Listing 8.9. Bidirectional traversal that finds paths between two users in a social network
Note
8.5. Summary
Chapter 9. Spring Data Neo4j
9.1. Where does SDN fit in?
Figure 9.1. Overview of where SDN fits within your broader application
9.1.1. What is Spring and how is SDN related to it?
9.1.2. What is SDN good for (and not good for)?
Note
9.1.3. Where to get SDN
9.1.4. Where to get more information
9.2. Modeling with SDN
Figure 9.2. Conceptual overview of the movie-lovers’ social network, with referrals
9.2.1. Initial POJO domain modeling
Listing 9.1. Initial POJO modeling attempt
Note
9.2.2. Annotating the domain model
Listing 9.2. SDN annotated domain model
9.2.3. Modeling node entities
Figure 9.3. Social network model with nodes highlighted
Properties
Indexed properties
Note
Relationships to other node entities
9.2.4. Modeling relationship entities
Figure 9.4. A social network model with highlighted relationships that could potentially be modeled as relationship entities
Note
Listing 9.3. The Viewing class as a relationship entity
Listing 9.4. User and Movie node entity snippets
9.2.5. Modeling relationships between node entities
Figure 9.5. A social network model with the relationship references between nodes highlighted
Listing 9.5. User and Movie node entity snippets
9.3. Accessing and persisting entities
9.3.1. Supporting Spring configuration
Listing 9.6. XML-based Spring configuration
Note
9.3.2. Neo4jTemplate class
Listing 9.7. A basic Neo4jTemplate example
Listing 9.8. A Neo4jTemplate example with full Spring integration
9.3.3. Repositories
Figure 9.6. Overview of SDN repository classes involved in accessing User node entity
Listing 9.9. Loading and saving data via the UserRepository
9.3.4. Other options
9.4. Object-graph mapping options
9.4.1. Simple mapping
Figure 9.7. Overview of the simple mapping logic
Listing 9.10. Loading and saving
Transitive persistence
Eager versus lazy loading
Listing 9.11. Implications for lazy loading
9.4.2. Advanced mapping based on AspectJ
Figure 9.8. Advanced mapping overview
Listing 9.12. Active record persistence with implicit transaction
9.4.3. Object mapping summary
Table 9.1. Comparison of simple and advanced mapping modes
9.5. Performing queries and traversals
Figure 9.9. Friends-of-friends submodel
9.5.1. Annotated queries
Annotation on node entities
Annotation on repository interfaces
Listing 9.13. @Query annotation on repository
9.5.2. Dynamically derived queries
Listing 9.14. Dynamically generated query methods
Handy Hint
Multiple and nested properties
And much, much more
9.5.3. Traversals
9.6. Summary
Part 3. Neo4j in Production
Chapter 10. Neo4j: embedded versus server mode
10.1. Usage modes overview
Figure 10.1. Overview of Neo4j usage modes and the main integration options for clients
Note
10.2. Embedded mode
10.2.1. Core Java integration
Figure 10.2. Typical Java-embedded deployment scenario, where the Neo4j libraries are embedded in the Java application
Required libraries
Listing 10.1. Embedded Neo4j dependencies
Listing 10.2. Dependency tree of core Neo4j embedded library
Gaining access to an embedded Neo4j database
Listing 10.3. Starting and stopping an embedded graph database
Testing in embedded mode
10.2.2. Other JVM-based integration
Figure 10.3. Other JVM-based embedded deployment approaches, involving language-specific wrappers and drivers for Neo4j
10.3. Server mode
10.3.1. Neo4j server overview
Figure 10.4. A typical Neo4j server setup with client access via the standard REST API
Installing and Using Neo4j Server
Curl and Java Client Examples
10.3.2. Using the fine-grained Neo4j server REST API
Listing 10.4. HTTP service root request and response
Listing 10.5. HTTP request and response for getting info about Adam via his userId
Listing 10.6. HTTP request and response for all of Adam’s relationships
10.3.3. Using the Cypher Neo4j server REST API endpoint
Listing 10.7. Using Cypher via REST API to get Adam’s info, including all relationships
10.3.4. Using a remote client library to help access the Neo4j server
Figure 10.5. Server-based deployment approach using remote REST client libraries
Listing 10.8. Java REST client using the java-rest-binding library
Note
10.3.5. Server plugins and unmanaged extensions
Figure 10.6. Accessing Neo4j via server plugins and unmanaged extensions
10.4. Weighing the options
Table 10.1. Advantages and disadvantages of Neo4j embedded and server modes
10.4.1. Architectural considerations
Language considerations
Separation of concerns: app concerns versus DB concerns
Figure 10.7. Two possible deployment scenarios for the social network application: embedded and server modes
Hardware considerations
10.4.2. Performance considerations
Table 10.2. The initial results of embedded versus server mode performance when creating new nodes
Table 10.3. Extended results of embedded versus server mode performance when creating new nodes
Listing 10.9. Code used for embedded performance test comparison
Listing 10.10. Code used for server performance test (RAW API) comparison
10.4.3. Other considerations
REST API: supported data exchange formats
Transactions
10.5. Getting the most out of the server mode
Table 10.4. Performance metrics log:template
10.5.1. Avoid fine-grained operations
Table 10.5. Performance metrics log after scenario 1, raw REST API
10.5.2. Using Cypher
Listing 10.11. Cypher REST request
Listing 10.12. Cypher REST response
Table 10.6. Performance metrics log after scenario 2, Cypher call
10.5.3. Server plugins
Listing 10.13. ServerPlugin class
All Nodes are Equal
Listing 10.14. Extension snippet of HTTP response for getting info on Adam node
Table 10.7. Performance metrics log after scenario 3, server plugin
10.5.4. Unmanaged extensions
Warning!
Listing 10.15. An unmanaged extension
Table 10.8. Performance metrics log after scenario 4, unmanaged extension
10.5.5. Streaming REST API
Figure 10.8. Result of turning streaming on/off
Note
10.6. Summary
Chapter 11. Neo4j in production
11.1. High-level Neo4j architecture
Figure 11.1. High-level overview of the Neo4j architecture
11.1.1. Setting the scene ...
11.1.2. Disks
What kind of disks should be used?
How much space do i need for my graph database?
11.1.3. Store files
Table 11.1. Primary store files in use and their associated properties
11.1.4. Neo4j caches
Figure 11.2. Neo4j’s use of RAM for caching
Filesystem cache
Configuring the filesystem cache
Note
Default configuration
Object cache
Configuring the object cache
Table 11.2. Object cache-type options as per the official documentation
Caching summary
11.1.5. Transaction logs and recoverability
Figure 11.3. Recap of where transaction logs fit into the overall Neo4j architecture
11.1.6. Programmatic APIs
Figure 11.4. Programmatic API stack
11.2. Neo4j High Availability (HA)
HA versus clustering
11.2.1. Neo4j clustering overview
Figure 11.5. Sample Neo4j HA cluster setup with 1 master and 2 slaves
What about CAP and ACID?
11.2.2. Setting up a Neo4j cluster
Note
Initial setup
Startup and verify
Figure 11.6. Web Admin Console view of HA setup from machine01’s perspective
Figure 11.7. Web Admin Console view of HA setup from machine02’s perspective
11.2.3. Replication—reading and writing strategies
Figure 11.8. Sequence of events when a write request is sent to the master instance
Note
Figure 11.9. Sequence of events when a write request is sent to a slave instance
To write through the slaves—or not to write through the slaves—th- hat is the question!
11.2.4. Cache sharding
Figure 11.10. Cache sharding
Routing strategies
11.2.5. HA summary
11.3. Backups
11.3.1. Offline backups
Shut down the neo4j instance
Copy the physical database files
Restart the database
Note
11.3.2. Online backups
Full backup
Incremental backup
The process of doing a backup
Figure 11.11. Example backup scenario for a single server setup
11.3.3. Restoring from backup
11.4. Topics we couldn’t cover but that you should be aware of
11.4.1. Security
11.4.2. Monitoring
11.5. Summary
11.6. Final thoughts
Appendix A. Installing Neo4j server
A.1. Installing and configuring a single Neo4j server
A.2. Neo4j browser
Figure A.1. The Neo4j browser splash page
Figure A.2. The Neo4j browser splash page with expanded sidebar
Figure A.3. The Neo4j browser’s graphical visualization of nodes in the system
A.3. Neo4j Web Admin Console
Figure A.4. Neo4j Web Admin Console
Appendix B. Setting up and running the sample code
B.1. Setting up your environment
Download the sample code
Install JDK (Oracle SE 7)
Remember
Install Maven (3.0.5+)
B.2. Running the demos and samples
General instructions
Chapter 10 instructions
Figure B.1. Default Maven output for chapter 10
Appendix C. Setting up your project to use SDN
C.1. Maven configuration
Listing C.1. Maven dependencies required for the simple mapping mode
Listing C.2. Maven dependencies required for the advanced mapping mode
Listing C.3. AspectJ build configuration for the advanced mapping mode
C.2. Spring configuration
Core XML configuration
Listing C.1. XML configuration using a store directory
Repository configuration
Additional Spring Data Commons Configuration Options
Appendix D. Getting more help
Index
SYMBOL
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
List of Figures
List of Tables
List of Listings
← Prev
Back
Next →
← Prev
Back
Next →