Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Foreword Preface
Goals and Audience Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us Acknowledgments
1. Architecture and Data Model
Recent Trends The Role of Databases Distributed Applications Fast Random Access
Accessing Sorted Versus Unsorted Data
Versions History Data Model
Rows and Columns Data Modification and Timestamps
Advanced Data Model Components
Column Families Column Visibility Full Data Model
Tables Introduction to the Client API
Approach to Rows Exploiting Sort Order
Architecture Overview
ZooKeeper Hadoop Accumulo
Tablet servers Master Garbage collector Monitor Client Thrift proxy
A Typical Cluster
Additional Features
Automatic Data Partitioning High Consistency Automatic Load Balancing Massive Scalability Failure Tolerance and Automatic Recovery Support for Analysis: Iterators Support for Analysis: MapReduce Integration Data Lifecycle Management Compression Robust Timestamps
Accumulo and Other Data Management Systems
Comparisons to Relational Databases
SQL Transactions Normalization
Comparisons to Other NoSQL Databases
Data model Key ordering Tight Hadoop integration High versus eventual consistency Column visibility and access control Iterators Dynamic column families and locality groups Support for very large rows Parallelized BatchScanners Namespaces
Use Cases Suited for Accumulo
A New Kind of Flexible Analytical Warehouse Building the Next Gmail Massive Graph or Machine-Learning Problems Relieving Relational Databases Massive Search Applications Applications with a Long History of Versioned Data
2. Quick Start
Demo of the Shell
The help Command Creating a Table and Inserting Some Data Scanning for Data Using Authorizations Using a Simple Iterator
Demo of Java Code
Creating a Table and Inserting Some Data Scanning for Data Using Authorizations Using a Simple Iterator
A More Complete Installation Other Important Resources One Last Example with a Unit Test Additional Resources
3. Basic API
Development Environment
Obtaining the Client Library Using Maven
Using Maven with an IDE
Configuring the Classpath
Introduction to the Example Application: Wikipedia Pages
Wikipedia Data Data Modeling Obtaining Example Code Downloading Sample Wikipedia Pages Downloading All English Wikipedia Articles
Connect Insert
Committing Mutations Handling Errors Insert Example Using Lexicoders Writing to Multiple Tables
Lookups and Scanning
Lookup Example Crafting Ranges Grouping by Rows Reusing Scanners Isolated Row Views Tuning Scanners
Batch Scanning Update: Overwrite
Overwrite Example Allowing Multiple Versions
Update: Appending or Incrementing Update: Read-Modify-Write and Conditional Mutations
Conditional Mutation API Conditional Mutation Batch API Conditional Mutation Example
Delete
Deleting and Reinserting Removing Deleted Data from Disk Batch Deleter
Testing
MockAccumulo MiniAccumuloCluster
4. Table API
Basic Table Operations
Creating Tables
Options for creating tables
Renaming Deleting Tables Deleting Ranges of Rows Deleting Entries Returned from a Scan Configuring Table Properties Locality Groups
Locality groups example
Bloom Filters
Key functors
Caching Tablet Splits
Quickly and automatically splitting Merging tablets
Compacting
Compaction properties
Additional Properties Online Status Cloning
Using cloning as a snapshotting mechanism
Importing and Exporting Tables Additional Administrative Methods
Table Namespaces
Creating Renaming Setting Namespace Properties Deleting Configuring Iterators Configuring Constraints Testing Class Loading for a Namespace
Instance Operations
Setting Properties
Configuration
Cluster Information Precedence of Properties
5. Security API
Authentication Permissions
System Permissions Namespace Permissions Table Permissions
Authorizations
Column Visibilities Limiting Authorizations Written An Example of Using Authorizations Using a Default Visibility Making Authorizations Work
Auditing Security Operations Custom Authentication, Permissions, and Authorization
Custom Authentication Example
Other Security Considerations
Using an Application Account for Multiple Users Network Disk Encryption
6. Server-Side Functionality and External Clients
Constraints
Constraint Configuration API Constraint Configuration Example Creating Custom Constraints Custom Constraint Example
Iterators
Iterator Configuration API VersioningIterator Iterator Configuration Example Adding Iterators by Setting Properties Filtering Iterators
Built-in filters Custom filters Custom filtering iterator example
Combiners
Combiners for incrementing or appending updates Built-in combiners Custom combiners Custom combiner example
Other Built-in Iterators
WholeRowIterator example Low-level iterator API
Thrift Proxy
Starting a Proxy Python Example Generating Client Code
Language-Specific Clients Integration with Other Tools
Apache Hive
Table options Serializing values Additional options Hive example Optimizing Hive queries
Apache Pig
Pig example
Apache Kafka
Integration with Analytical Tools
7. MapReduce API
Formats Writing Worker Classes MapReduce Example MapReduce over Underlying RFiles
Example of Running a MapReduce Job over RFiles
Delivering Rows to Map Workers Ingesters and Combiners as MapReduce Computations MapReduce and Bulk Import
Bulk Ingest to Avoid Duplicates
8. Table Design
Single-Table Designs
Implementing Paging
Secondary Indexing
Index Partitioned by Term Querying a Term-Partitioned Index
Combining query terms Querying for a term in a specific field
Maintaining Consistency Across Tables
Using MultiTableBatchWriter for consistency
Index Partitioned by Document Querying a Document-Partitioned Index Indexing Data Types
Using Lexicoders in indexing Custom Lexicoder example: Inet4AddressLexicoder
Full-Text Search
wikipediaMetadata wikipediaIndex wikipedia wikipediaReverseIndex Ingesting WikiSearch Data Querying the WikiSearch Data
Designing Row IDs
Lexicoders Composite Row IDs Key Size Avoiding Hotspots Designing Row IDs for Consistent Updates
Designing Values
Storing Files and Large Values Human-Readable Versus Binary Values and Formatters
Designing Authorizations Designing Column Visibilities
9. Advanced Table Designs
Time-Ordered Data Graphs
Building an Example Graph: Twitter Traversing Graph Tables Traversing the Example Twitter Graph
Blueprints for Accumulo Titan
Semantic Triples
Semantic Triples Example
Spatial Data
Open Source Projects Space-Filling Curves
Multidimensional Data D4M and Matlab
D4M Example
Adding D4M to Octave or Matlab Loading example data Load example data using Java
Machine Learning
Storing Feature Vectors A Machine-Learning Example
Approximating Relational and SQL Database Properties
Schema Constraints SQL Operations
SELECT WHERE JOIN, GROUP BY, and ORDER BY Strategies for Joins GROUP BY and ORDER BY
10. Internals
Tablet Server
Write Path Read Path Resource Manager
Minor compaction Major compaction Merging minor compaction Splits
Write-Ahead Logs
Recovery
File formats
RFile optimizations Relative key encoding Locality groups Bloom filters
Caching
Master
FATE Load Balancer
Garbage Collector Monitor Tracer Client
Locating Keys
Metadata Table Uses of ZooKeeper Accumulo and the CAP Theorem
11. Administration: Setup
Preinstallation
Operating Systems Kernel Tweaks
Swappiness Number of open files
Native Libraries User Accounts Linux Filesystem System Services Software Dependencies
Apache Hadoop Apache ZooKeeper
Installation
Tarball Distribution Install Installing on Cloudera’s CDH Installing on Hortonworks’ HDP Installing on MapR Running via Amazon Web Services Building from Source
Building a tarball distribution Building native libraries
Configuration
File Permissions Server Configuration Files
accumulo-env.sh accumulo-site.xml
Client Configuration Deploying JARs
Using lib/ext/ Custom JAR loading example Using HDFS
Setting Up Automatic Failover Initialization
To reinitialize Multiple instances
Running Very Large-Scale Clusters
Networking Limits Metadata Table Tablet Sizing File Sizing Using Multiple HDFS Volumes
Handling NameNode hostname changes
Security
Column Visibilities and Accumulo Clients Supporting Software Security Network Security
Configuring SSL
Encryption of Data at Rest Kerberized Hadoop Application Permissions
12. Administration: Running
Starting Accumulo
Via the start-all.sh Script Via init.d Scripts
Stopping Accumulo
Via the stop-all.sh Script Via init.d scripts Stopping Individual Processes
Starting After a Crash Monitoring
Monitor Web Service
Overview Master Server View Tablet Servers View Server Activity View Garbage Collector View Tables View Recent Traces View Documentation View Recent Logs View
JMX Metrics Logging Tracing
Tracing in the shell
Cluster Changes
Adding New Worker Nodes Removing Worker Nodes Adding New Control Nodes Removing Control Nodes
Table Operations
Changing Settings
Altering load balancing Configuring iterators Safely deploying custom iterators
Changing Online Status Cloning
Altering cloned table properties Cloning for MapReduce
Import, Export, and Backups
Exporting a table Importing an exported table Bulk-loading files from a MapReduce job
Data Lifecycle
Versioning Data Age-off
Ensuring that deletes are removed from tables
Compactions
Using major compaction to apply changes Compacting specific ranges
Merging Tablets Garbage Collection
Failure Recovery
Typical Failures
Single machine failure Single machine unresponsiveness Network partitions
More-Serious Failures
All NameNodes failing simultaneously All ZooKeeper servers failing simultaneously Power loss to the data center Loss of all replicas of an HDFS data block
Tips for Restoring a Cluster
Replay data Back up NameNode metadata Back up table configuration, users, and split points Turn on HDFS trash Create an empty RFile Take Hadoop out of safe mode manually
Troubleshooting
Ensure that processes are running Check log messages Understand network partitions Exception when scanning a table in the shell Graphs on the monitor are “blocky” Tablets not balancing across tablet servers Calculate the size of changes to a cloned table Unexpected or unexplained query results Slow queries Look at ZooKeeper Use the listscans command Look at user-initiated compactions Inspect RFiles
13. Performance
Understanding Read Performance Understanding Write Performance
BatchWriters Bulk Loading
Hardware Selection
Storage Devices
Hard disk drives Storage-area networks Solid-state disks
Networking Virtualization Running in a Public Cloud Environment
Cluster Sizing
Modeling Required Write Performance Cluster Planning Example
Estimated total volume of data Types of user requests and indexes required Compactions Rate of incoming data Age-off strategy
Analyzing Performance
Using Tracing Using the Monitor Using Local Logs
Tablet Server Tuning
External Settings
HDFS threads used to transfer data HDFS durable sync
Memory Settings
tserver.memory.maps.max tserver.memory.maps.native.enabled Cache settings Java heap size tserver.mutation.queue.max
Write-Ahead Log Settings
tserver.wal.replication tserver.wal.sync tserver.wal.sync.method
Resource Settings
tserver.compaction.major.concurrent.max tserver.compaction.minor.concurrent.max tserver.readahead.concurrent.max
Timeouts Scaling Vertically
Cluster Tuning
Splitting Tables Balancing Tablets Balancing Reads and Writes Data Locality Sharing ZooKeeper
A. Shell Commands Quick Reference
Debugging Exiting Help Iterator Permissions Administration Shell Execution Shell State Table Administration Table Control User Administration Writing, Reading, and Removing Data
B. Metadata Table
Row ID File Column Family Scan Column Family future, last, and loc Column Families log Column Family srv Column Family ~tab:~pr Column Other Columns
C. Data Stored in ZooKeeper
masters, tservers, gc, monitor, and tracers Nodes problems/problem_info Nodes root_tablet Node tables/table_id Nodes config/system_property_name Node users/username Nodes Other Nodes
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion