Neo4j in Action by Partner, Jonas -- Read -- Imperial Library of Trantor

Index

Neo4j in Action

Aleksa Vukotic and Nicki Watt with Tareq Abedrabbo, Dominic Fox, and Jonas Partner Copyright Brief Table of Contents Table of Contents Foreword Preface Acknowledgments About this Book

Roadmap Code conventions and downloads Author Online forum

About the Authors About the Cover Illustration Part 1. Introduction to Neo4j Chapter 1. A case for a Neo4j database

1.1. Why Neo4j?

Figure 1.1. Users and their friends represented as a graph data structure Note

1.2. Graph data in a relational database

Figure 1.2. SQL diagram of tables representing user and friend data Listing 1.1. SQL script defining tables for social network data 1.2.1. Querying graph data using MySQL

Note Table 1.1. Execution times for multiple join queries using a MySQL database engine on a data set of 1,000 users Note Note

1.3. Graph data in Neo4j

1.3.1. Traversing the graph

Listing 1.2. Neo4j Traversal API code for finding all friends at depth 2 Figure 1.3. Traversing the social network graph data Table 1.2. The execution times for graph traversal using Neo4j on a data set of 1,000 users Note Note

1.4. SQL joins versus graph traversal on a large scale

Table 1.3. The execution times for multiple join queries using a MySQL database engine on a data set of 1 million users Note Table 1.4. The execution times for graph traversal using Neo4j on a data set of 1 million users

1.5. Graphs around you 1.6. Neo4j in NoSQL space

Note 1.6.1. Key-value stores 1.6.2. Column-family stores 1.6.3. Document-oriented databases 1.6.4. Graph databases 1.6.5. NoSQL categories compared

Table 1.5. An overview of NoSQL categories

1.7. Neo4j: the ACID-compliant database

Note

1.8. Summary

Chapter 2. Data modeling in Neo4j

2.1. What is a data model for Neo4j?

2.1.1. Modeling with diagrams: a simple example

Figure 2.1. Users and groups in an RDBMS with a join table Figure 2.2. A person belongs to a group. Figure 2.3. Three people in two groups

2.1.2. Modeling with diagrams: a complex example

Figure 2.4. The entities in an access control system Figure 2.5. Relationships between entities in the access control system Figure 2.6. The access control model extended to support subgroups

2.2. Domain modeling

2.2.1. Entities and properties

Figure 2.7. A User with some properties Figure 2.8. The user with some nodes converted to properties Figure 2.9. Two users with different relationships to the same address Figure 2.10. Two users with differently qualified relationships to the same address Figure 2.11. Reifying the relationship between a User and an Address in order to introduce other participants Figure 2.12. A User with additional properties stored in a related UserEx node

2.3. Further examples

2.3.1. Underground stations example

Figure 2.13. Two underground stations, with the connection between them reified as a single Track Section Figure 2.14. Two underground stations, with connections on two different lines

2.3.2. Band members example

Figure 2.15. A simple model showing band members and a recorded album Figure 2.16. A more complex model showing two different lineups of the same band

2.4. Summary

Chapter 3. Starting development with Neo4j

3.1. Modeling graph data structures

Figure 3.1. Users in a social network represented as boxes Figure 3.2. Simple social network graph with users connected as friends Note Figure 3.3. Richer model with a name property assigned to each user Figure 3.4. Introducing movies to the model Figure 3.5. Introducing type properties to differentiate between User and Movie elements Figure 3.6. Complete model of a movie-lover’s social network

3.2. Using the Neo4j API

3.2.1. Creating nodes

Figure 3.7. Graph with only node entities Listing 3.1. Creating a single user node in Neo4j (Java 6/Neo4j 1.9.X style) Listing 3.2. Creating single user node in Neo4j (Java 7/Neo4j 2.0.X style) Listing 3.3. Creating multiple nodes in a single transaction

3.2.2. Creating relationships

Figure 3.8. Simple social network graph with users connected as friends Note Listing 3.4. Creating relationships between nodes using the Neo4j Core Java API

3.2.3. Adding properties to nodes

Listing 3.5. Adding name property to user nodes Note Table 3.1. The property types in Neo4j Note Listing 3.6. Adding different property types to nodes Figure 3.9. Rich property graph representing the social network Note

3.2.4. Node type strategies

Listing 3.7. Creating movie nodes using Neo4j Core Java API Figure 3.10. Graph with nodes representing users and movies Listing 3.8. Adding a type property to determine node types Note Figure 3.11. Nodes using the type property strategy

3.2.5. Adding properties to relationships

Listing 3.9. Creating relationships with properties Figure 3.12. Complete model of a movie-lovers’ social network

3.3. Node labels

Note Listing 3.10. Adding labels to nodes Figure 3.13. Movie nodes grouped using the label MOVIE Note Listing 3.11. Finding nodes with a given label and property Note

3.4. Summary

Chapter 4. The power of traversals

4.1. Traversing using the Neo4j Core Java API

Figure 4.1. The selected user and the movies he’s seen are marked with a bold border. 4.1.1. Finding the starting node

Note

4.1.2. Traversing direct relationships

Listing 4.1. Filtering movies by iterating through all relationships from the node Note Note Note Listing 4.2. Filtering movies using the Neo4j Core Java API filtering capabilities

4.1.3. Traversing second-level relationships

Figure 4.2. Nodes and relationships to follow to find movies that John’s friends like Listing 4.3. Finding movies that have been seen by John’s friends Listing 4.4. Finding movies that have been seen by John’s friends but not by John

4.1.4. Memory usage considerations

Listing 4.5. Using iterables to lower Java heap memory consumption Note

4.2. Traversing using the Neo4j Traversal API

4.2.1. Using Neo4j’s built-in traversal constructs

Listing 4.6. Using the Neo4j Traversal API to find movies seen by friends

4.2.2. Implementing a custom evaluator

Table 4.1. The methods defined on the org.neo4j.graphdb.Path interface Note Table 4.2. The possible values of the Evaluation enumeration Listing 4.7. Custom evaluator to exclude movies that the user has seen Listing 4.8. Improved traversal definition with a custom evaluator

4.3. Summary

Chapter 5. Indexing the data

5.1. Creating the index entry

Figure 5.1. Index pointing to user nodes as values, using the email property as a key Listing 5.1. Creating an index entry for a node using Neo4j API

5.2. Finding the user by their email

Figure 5.2. Looking up a user node from the index by using the email property Listing 5.2. Finding a single user by index lookup using the email property

5.3. Dealing with more than one match

Figure 5.3. User nodes indexed by the age property, with each key potentially referencing multiple nodes Listing 5.3. Iterating through multiple results of an index lookup operation Note Note

5.4. Dealing with changes to indexed data

Listing 5.4. Updating the index using sequential remove and add operations

5.5. Automatic indexing

5.5.1. Schema indexing

Listing 5.5. Using schema indexes with Java API Listing 5.6. Updating multiple schema indexes

5.5.2. Auto-indexing

Configuring auto-indexing in standalone mode Configuring auto-indexing in embedded mode Using an automatically created index

5.6. The cost/benefit trade-off of indexing

Figure 5.4. Graph of a social network using intermediate nodes to differentiate between relationships to user and film nodes. 5.6.1. Performance benefit of indexing when querying

Figure 5.5. Performance of node lookup using index compared to iterating through all nodes

5.6.2. Performance overhead of indexing when updating and inserting

Figure 5.6. Average time for storing user node with and without indexing

5.6.3. Storing the index

5.7. Summary

Part 2. Application Development with Neo4j Chapter 6. Cypher: Neo4j query language

6.1. Introduction to Cypher

6.1.1. Cypher primer

Figure 6.1. A social network graph to be queried Listing 6.1. Traversing the graph using Java API to find all movies the user has seen Note

6.1.2. Executing Cypher queries

Table 6.1. Tools and techniques for executing Cypher queries Executing Cypher using the Neo4j Shell Table 6.2. Neo4j Shell startup script syntax for Linux and Windows environments Figure 6.2. The Neo4j Shell ready to accept commands Note Figure 6.3. Cypher is executed in the Neo4j Shell natively, resulting in tabular output. Executing Cypher using the Web Admin Console Figure 6.4. Homepage of the Neo4j Web Admin Console in the browser Figure 6.5. Executing a Cypher query inside the Web Admin Console Note Executing Cypher from Java code

6.2. Cypher syntax basics

6.2.1. Pattern matching

Using node and relationship identifiers Note Complex Pattern Matching Note Note Figure 6.6. Result of the query execution in Neo4j Shell, finding movie recommendations for the user based on the movies their friends have seen Note

6.2.2. Finding the starting node

Node lookup by ID Loading multiple nodes by IDs Note Using an index to look up the starting node(s) Note Note Using a schema-based index to look up the starting node(s) Note Multiple start nodes in Cypher

6.2.3. Filtering data 6.2.4. Getting the results

Note Returning properties Note Returning relationships Returning paths Note Paging results Note

6.3. Updating your graph with Cypher

Note 6.3.1. Creating new graph entities 6.3.2. Deleting data 6.3.3. Updating node and relationship properties

6.4. Advanced Cypher

6.4.1. Aggregation

Note

6.4.2. Functions

Note

6.4.3. Piping using the with clause

Note

6.4.4. Cypher compatibility

Note

6.5. Summary

Chapter 7. Transactions

7.1. Transaction basics

Listing 7.1. Attempting to update without a transaction Note 7.1.1. Adding in a transaction

Listing 7.2. Attempting to update with a transaction

7.1.2. Finishing what you start and not trying to do too much in one go

Listing 7.3. A really big transaction can run out of memory

7.2. Transactions in depth

7.2.1. Transaction semantics

Durability Isolation levels and Neo4j Locks Figure 7.1. Default isolation level

7.2.2. Reading in a transaction and explicit read locks

Listing 7.4. Reading the same thing twice without a transaction Listing 7.5. Reading the same thing twice with increased isolation Figure 7.2. Explicit read locks

7.2.3. Writing in a transaction and explicit write locks

Listing 7.6. Acquiring write locks explicitly

7.2.4. The danger of deadlocks

7.3. Integration with other transaction management systems

Listing 7.7. Configuring Spring transaction manager Listing 7.8. Declarative transaction management

7.4. Transaction events

Listing 7.9. Transaction event handlers

7.5. Summary

Chapter 8. Traversals in depth

8.1. Traversal ordering

Figure 8.1. Simple graph with nine nodes and eight relationships 8.1.1. Depth-first

Figure 8.2. Walking the graph using depth-first ordering Listing 8.1. Walking the entire graph depth-first using the Neo4j Traversal API Note

8.1.2. Breadth-first

Figure 8.3. Breadthfirst traversal of the sample graph Listing 8.2. Breadth-first traversal using the Neo4j Traversal API

8.1.3. Comparing depth-first and breadth-first ordering

Note Table 8.1. The performance of a traversal depending on the location of node searched for and the traversal algorithm used

8.2. Expanding relationships

8.2.1. StandardExpander

Note Figure 8.4. A social network of users and movies they like Listing 8.3. Finding all movies John’s friends and colleagues like

8.2.2. Ordering relationships for expansion

Listing 8.4. Expanding relationships in the order of relationship types

8.2.3. Custom expanders

Listing 8.5. Expanding relationships based on distance from the starting node Listing 8.6. Finding all movies John’s friends like using a custom expander

8.3. Managing uniqueness

8.3.1. NODE_GLOBAL uniqueness

Note Figure 8.5. A simple social network graph Listing 8.7. Finding Jane’s direct connections who can introduce her to Ben

8.3.2. NODE_PATH uniqueness

Listing 8.8. Making connections using NODE_PATH uniqueness

8.3.3. Other uniqueness types

8.4. Bidirectional traversals

Figure 8.6. Bidirectional traversal used to find the path between two nodes Listing 8.9. Bidirectional traversal that finds paths between two users in a social network Note

8.5. Summary

Chapter 9. Spring Data Neo4j

9.1. Where does SDN fit in?

Figure 9.1. Overview of where SDN fits within your broader application 9.1.1. What is Spring and how is SDN related to it? 9.1.2. What is SDN good for (and not good for)?

Note

9.1.3. Where to get SDN 9.1.4. Where to get more information

9.2. Modeling with SDN

Figure 9.2. Conceptual overview of the movie-lovers’ social network, with referrals 9.2.1. Initial POJO domain modeling

Listing 9.1. Initial POJO modeling attempt Note

9.2.2. Annotating the domain model

Listing 9.2. SDN annotated domain model

9.2.3. Modeling node entities

Figure 9.3. Social network model with nodes highlighted Properties Indexed properties Note Relationships to other node entities

9.2.4. Modeling relationship entities

Figure 9.4. A social network model with highlighted relationships that could potentially be modeled as relationship entities Note Listing 9.3. The Viewing class as a relationship entity Listing 9.4. User and Movie node entity snippets

9.2.5. Modeling relationships between node entities

Figure 9.5. A social network model with the relationship references between nodes highlighted Listing 9.5. User and Movie node entity snippets

9.3. Accessing and persisting entities

9.3.1. Supporting Spring configuration

Listing 9.6. XML-based Spring configuration Note

9.3.2. Neo4jTemplate class

Listing 9.7. A basic Neo4jTemplate example Listing 9.8. A Neo4jTemplate example with full Spring integration

9.3.3. Repositories

Figure 9.6. Overview of SDN repository classes involved in accessing User node entity Listing 9.9. Loading and saving data via the UserRepository

9.3.4. Other options

9.4. Object-graph mapping options

9.4.1. Simple mapping

Figure 9.7. Overview of the simple mapping logic Listing 9.10. Loading and saving Transitive persistence Eager versus lazy loading Listing 9.11. Implications for lazy loading

9.4.2. Advanced mapping based on AspectJ

Figure 9.8. Advanced mapping overview Listing 9.12. Active record persistence with implicit transaction

9.4.3. Object mapping summary

Table 9.1. Comparison of simple and advanced mapping modes

9.5. Performing queries and traversals

Figure 9.9. Friends-of-friends submodel 9.5.1. Annotated queries

Annotation on node entities Annotation on repository interfaces Listing 9.13. @Query annotation on repository

9.5.2. Dynamically derived queries

Listing 9.14. Dynamically generated query methods Handy Hint Multiple and nested properties And much, much more

9.5.3. Traversals

9.6. Summary

Part 3. Neo4j in Production Chapter 10. Neo4j: embedded versus server mode

10.1. Usage modes overview

Figure 10.1. Overview of Neo4j usage modes and the main integration options for clients Note

10.2. Embedded mode

10.2.1. Core Java integration

Figure 10.2. Typical Java-embedded deployment scenario, where the Neo4j libraries are embedded in the Java application Required libraries Listing 10.1. Embedded Neo4j dependencies Listing 10.2. Dependency tree of core Neo4j embedded library Gaining access to an embedded Neo4j database Listing 10.3. Starting and stopping an embedded graph database Testing in embedded mode

10.2.2. Other JVM-based integration

Figure 10.3. Other JVM-based embedded deployment approaches, involving language-specific wrappers and drivers for Neo4j

10.3. Server mode

10.3.1. Neo4j server overview

Figure 10.4. A typical Neo4j server setup with client access via the standard REST API Installing and Using Neo4j Server Curl and Java Client Examples

10.3.2. Using the fine-grained Neo4j server REST API

Listing 10.4. HTTP service root request and response Listing 10.5. HTTP request and response for getting info about Adam via his userId Listing 10.6. HTTP request and response for all of Adam’s relationships

10.3.3. Using the Cypher Neo4j server REST API endpoint

Listing 10.7. Using Cypher via REST API to get Adam’s info, including all relationships

10.3.4. Using a remote client library to help access the Neo4j server

Figure 10.5. Server-based deployment approach using remote REST client libraries Listing 10.8. Java REST client using the java-rest-binding library Note

10.3.5. Server plugins and unmanaged extensions

Figure 10.6. Accessing Neo4j via server plugins and unmanaged extensions

10.4. Weighing the options

Table 10.1. Advantages and disadvantages of Neo4j embedded and server modes 10.4.1. Architectural considerations

Language considerations Separation of concerns: app concerns versus DB concerns Figure 10.7. Two possible deployment scenarios for the social network application: embedded and server modes Hardware considerations

10.4.2. Performance considerations

Table 10.2. The initial results of embedded versus server mode performance when creating new nodes Table 10.3. Extended results of embedded versus server mode performance when creating new nodes Listing 10.9. Code used for embedded performance test comparison Listing 10.10. Code used for server performance test (RAW API) comparison

10.4.3. Other considerations

REST API: supported data exchange formats Transactions

10.5. Getting the most out of the server mode

Table 10.4. Performance metrics log:template 10.5.1. Avoid fine-grained operations

Table 10.5. Performance metrics log after scenario 1, raw REST API

10.5.2. Using Cypher

Listing 10.11. Cypher REST request Listing 10.12. Cypher REST response Table 10.6. Performance metrics log after scenario 2, Cypher call

10.5.3. Server plugins

Listing 10.13. ServerPlugin class All Nodes are Equal Listing 10.14. Extension snippet of HTTP response for getting info on Adam node Table 10.7. Performance metrics log after scenario 3, server plugin

10.5.4. Unmanaged extensions

Warning! Listing 10.15. An unmanaged extension Table 10.8. Performance metrics log after scenario 4, unmanaged extension

10.5.5. Streaming REST API

Figure 10.8. Result of turning streaming on/off Note

10.6. Summary

Chapter 11. Neo4j in production

11.1. High-level Neo4j architecture

Figure 11.1. High-level overview of the Neo4j architecture 11.1.1. Setting the scene ... 11.1.2. Disks

What kind of disks should be used? How much space do i need for my graph database?

11.1.3. Store files

Table 11.1. Primary store files in use and their associated properties

11.1.4. Neo4j caches

Figure 11.2. Neo4j’s use of RAM for caching Filesystem cache Configuring the filesystem cache Note Default configuration Object cache Configuring the object cache Table 11.2. Object cache-type options as per the official documentation Caching summary

11.1.5. Transaction logs and recoverability

Figure 11.3. Recap of where transaction logs fit into the overall Neo4j architecture

11.1.6. Programmatic APIs

Figure 11.4. Programmatic API stack

11.2. Neo4j High Availability (HA)

HA versus clustering 11.2.1. Neo4j clustering overview

Figure 11.5. Sample Neo4j HA cluster setup with 1 master and 2 slaves What about CAP and ACID?

11.2.2. Setting up a Neo4j cluster

Note Initial setup Startup and verify Figure 11.6. Web Admin Console view of HA setup from machine01’s perspective Figure 11.7. Web Admin Console view of HA setup from machine02’s perspective

11.2.3. Replication—reading and writing strategies

Figure 11.8. Sequence of events when a write request is sent to the master instance Note Figure 11.9. Sequence of events when a write request is sent to a slave instance To write through the slaves—or not to write through the slaves—th- hat is the question!

11.2.4. Cache sharding

Figure 11.10. Cache sharding Routing strategies

11.2.5. HA summary

11.3. Backups

11.3.1. Offline backups

Shut down the neo4j instance Copy the physical database files Restart the database Note

11.3.2. Online backups

Full backup Incremental backup The process of doing a backup Figure 11.11. Example backup scenario for a single server setup

11.3.3. Restoring from backup

11.4. Topics we couldn’t cover but that you should be aware of

11.4.1. Security 11.4.2. Monitoring

11.5. Summary 11.6. Final thoughts

Appendix A. Installing Neo4j server

A.1. Installing and configuring a single Neo4j server A.2. Neo4j browser

Figure A.1. The Neo4j browser splash page Figure A.2. The Neo4j browser splash page with expanded sidebar Figure A.3. The Neo4j browser’s graphical visualization of nodes in the system

A.3. Neo4j Web Admin Console

Figure A.4. Neo4j Web Admin Console

Appendix B. Setting up and running the sample code

B.1. Setting up your environment

Download the sample code Install JDK (Oracle SE 7)

Remember

Install Maven (3.0.5+)

B.2. Running the demos and samples

General instructions Chapter 10 instructions

Figure B.1. Default Maven output for chapter 10

Appendix C. Setting up your project to use SDN

C.1. Maven configuration

Listing C.1. Maven dependencies required for the simple mapping mode Listing C.2. Maven dependencies required for the advanced mapping mode Listing C.3. AspectJ build configuration for the advanced mapping mode

C.2. Spring configuration

Core XML configuration

Listing C.1. XML configuration using a store directory

Repository configuration

Additional Spring Data Commons Configuration Options

Appendix D. Getting more help Index

SYMBOL A B C D E F G H I J K L M N O P Q R S T U V W

List of Figures List of Tables List of Listings

← Prev
Back
Next →

← Prev
Back
Next →