Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Monitoring with Ganglia
Preface
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
1. Introducing Ganglia
It’s a Problem of Scale
Hosts ARE the Monitoring System
Redundancy Breeds Organization
Is Ganglia Right for You?
gmond: Big Bang in a Few Bytes
gmetad: Bringing It All Together
gweb: Next-Generation Data Analysis
But Wait! That’s Not All!
2. Installing and Configuring Ganglia
Installing Ganglia
gmond
Requirements
Linux
Debian-based distributions
RPM-based distributions
OS X
Solaris
Other platforms
gmetad
Requirements
Linux
Debian-based distributions
RPM-based distributions
OS X
Solaris
gweb
Requirements
Linux
Debian-based distributions
RPM-based distributions
OS X
Solaris
Configuring Ganglia
gmond
Topology considerations
Configuration file
Section: globals
Section: cluster
Section: host
Section: UDP channels
Section: TCP Accept Channels
Access control
Optional section: sFlow
Section: modules
Section: collection_group
gmetad
gmetad topology
gmetad.conf: gmetad configuration file
The data_source attribute
gmetad daemon behavior
RRDtool attributes
Graphite support
gmetad interactive port query syntax
gweb
Apache virtual host configuration
gweb options
Application settings
Look and feel
Security
Advanced features
Postinstallation
Starting Up the Processes
Testing Your Installation
Firewalls
3. Scalability
Who Should Be Concerned About Scalability?
gmond and Ganglia Cluster Scalability
gmetad Storage Planning and Scalability
RRD File Structure and Scalability
Acute IO Demand During gmetad Startup
gmetad IO Demand During Normal Operation
Forecasting IO Workload
Testing the IO Subsystem
Dealing with High IO Demand from gmetad
4. The Ganglia Web Interface
Navigating the Ganglia Web Interface
The gweb Main Tab
Grid View
Cluster View
Physical view
Adjusting the time range
Host View
Viewing individual metrics
Node view
Graphing All Time Periods
The gweb Search Tab
The gweb Views Tab
The gweb Aggregated Graphs Tab
Decompose Graphs
The gweb Compare Hosts Tab
The gweb Events Tab
Events API
Examples
The gweb Automatic Rotation Tab
The gweb Mobile Tab
Custom Composite Graphs
Other Features
Authentication and Authorization
Configuration
Enabling Authentication
Sample Apache configuration
Other web servers
Access Controls
Actions
Configuration Examples
5. Managing and Extending Metrics
gmond: Metric Gathering Agent
Base Metrics
Extended Metrics
Extending gmond with Modules
C/C++ Modules
Anatomy of a C/C++ module
mmodule structure
Ganglia_25metric structure
metric_init callback function
metric_cleanup function
metric_handler function
Configuring a C/C++ metric module
Deploying a C/C++ metric module
Cloning and building a C/C++ module with autotools
Adding a module within either project
Creating a new project
Putting it all together with autotools
Mod_Python
Configuring gmond to support Python metric modules
Writing a Python metric module
Debugging and testing a Python metric module
Configuring a Python metric module
Deploying a Python metric module
Spoofing with Modules
Extending gmond with gmetric
Running gmetric from the Command Line
Spoofing with gmetric
How to Choose Between C/C++, Python, and gmetric
XDR Protocol
Packets
Implementations
Java and gmetric4j
Real World: GPU Monitoring with the NVML Module
Installation
Metrics
Configuration
6. Troubleshooting Ganglia
Overview
Known Bugs and Other Limitations
Useful Resources
Release Notes
Manpages
Wiki
IRC
Mailing Lists
Bug Tracker
Monitoring the Monitoring System
General Troubleshooting Mechanisms and Tools
netcat and telnet
Logs
Running in Foreground/Debug Mode
strace and truss
valgrind: Memory Leaks and Memory Corruption
iostat: Checking IOPS Demands of gmetad
Restarting Daemons
gstat
Common Deployment Issues
Reverse DNS Lookups
Time Synchronization
Mixing Ganglia Versions Older than 3.1 with Current Versions
SELinux and Firewall
Typical Problems and Troubleshooting Procedures
Web Issues
Blank page appears in the browser
Browser displays white page with error message
Cluster view shows uppercase hostname, link doesn’t work
Host appears in the wrong cluster
Host appears multiple times in web, different variations of the hostname (or IP address)
Some hosts appear with shortname instead of FQDN
One or more hosts don’t appear in the web interface
Hosts don’t appear/data stale after UDP aggregator restarted
Dead/retired hosts still appearing in the Web
Wrong CPU count or other metrics are missing
Fonts in graphs are too big or too small
Spikes in graphs
Custom metrics don’t appear
Custom metric’s value is truncated
Gaps appear randomly in the graphs
Some host is completely missing from the cluster
gmetad hierarchy and federation; some grids don’t appear on the Web
gmetad Issues
Empty (size = 0) RRD files created
gmetad takes a long time to start
gmetad segmentation fault writing to RRD
gmetad doesn’t poll all nodes defined in data_source
RRA definition changed in gmetad.conf, but RRD files are unchanged
rrdcached Issues
gmond Issues
gmond fails to start or localhost issues
gmond uses a lot of RAM
gmond doesn’t start properly on bootup
UDP receives buffer errors on a machine running gmond
7. Ganglia and Nagios
Sending Nagios Data to Ganglia
Monitoring Ganglia Metrics with Nagios
Principle of Operation
Check Heartbeat
Check a Single Metric on a Specific Host
Check Multiple Metrics on a Specific Host
Check Multiple Metrics on a Range of Hosts
Verify that a Metric Value Is the Same Across a Set of Hosts
Displaying Ganglia Data in the Nagios UI
Monitoring Ganglia with Nagios
Monitoring Processes
Monitoring Connectivity
Monitoring cron Collection Jobs
Collecting rrdcached Metrics
8. Ganglia and sFlow
Architecture
Standard sFlow Metrics
Server Metrics
Hypervisor Metrics
Java Virtual Machine Metrics
HTTP Metrics
memcache Metrics
Configuring gmond to Receive sFlow
Host sFlow Agent
Host sFlow Subagents
Custom Metrics Using gmetric
Troubleshooting
Are the Measurements Arriving at gmond?
Are the Measurements Being Sent?
Using Ganglia with Other sFlow Tools
9. Ganglia Case Studies
Tagged, Inc.
Site Architecture
Monitoring Configuration
Apache
memcached
Java
Examples
Optimizing memcached efficiency
Web load
Java performance
SARA
Overview
Advantages
Operational
Users
Customizations
Metrics
Custom graphs
Challenges
Central collector unicast receiver
Server RRD IO
Conclusion
Reuters Financial Software
Ganglia in the QA Environment
Market data overload
Analysis and reproducing the problem
Validating the solution
Ganglia in a Major Client Project
Upgrading takes too long
Analysis and studying the problem
Using Ganglia for the analysis
Results
Lumicall (Mobile VoIP on Android)
Monitoring Mobile VoIP for the Enterprise
Ganglia Monitoring Within Lumicall
Implementing gmetric4j Within Lumicall
Lumicall: Conclusion
Wait, How Many Metrics? Monitoring at Quantcast
Reporting, Analysis, and Alerting
Holt-Winters aberrance detection
Ganglia as an Application Platform
Best Practices
Using tmpfs to handle high IOPS
Sharding and instancing
Tools
snmp2ganglia
json2gmetrics
gmond plug-ins
RRD management scripts
Drawbacks
Necessity of sharding
RRD data consolidation
Coordination over a WAN
Excessive IOPS for RRD updates
Conclusions
Many Tools in the Toolbox: Monitoring at Etsy
Monitoring Is Mandatory
A Spectrum of Tools
Embrace Diversity
Conclusion
A. Advanced Metric Configuration and Debugging
Module Metric Definitions
Mod_MultiCPU
Mod_GStatus
Multidisk
memcached
TcpConn
Advanced Metrics Aggregation and You
Configuring statsd
statsd
statsd-c
py-statsd
Configuring VDED
rrdcached
Installing
Configuring gmetad for rrdcached
Controlling rrdcached
Troubleshooting
Permissions
Delays in metrics
Debugging with gmond-debug
B. Ganglia and Hadoop/HBase
Introducing Hadoop and HBase
Configuring Hadoop and HBase to Publish Metrics to Ganglia
Index
About the Authors
Colophon
Copyright
← Prev
Back
Next →
← Prev
Back
Next →