Archive for category FYI
We have been working hard on new Ganglia Web features that will be part of Ganglia Web 2.2.0. These are the highlights
Allows you to compare hosts across all the matching metrics (this can mean hundreds of graphs ). You supply a regular expression that matches a set of the hosts and Ganglia will aggregate all hosts for each metric. This is useful in those cases where you are trying to find why a particular host or hosts are performing differently then another set.
Built-in Nagios integration
This feature allows you to use your Ganglia trending data to alert in Nagios. There a couple nice addition to the basic check functionality e.g.
- Check heartbeat – as you may know gmond daemons sends a periodic heartbeat (every 20 seconds by default). If the heartbeat is missing it is fair to assume host is down. This should avoid you from having to use things like check_ping and alert you to potential down time much quicker
- Check multiple metrics – allows you to use a single check to multiple metrics on the same host ie. check that disk free on / is more than 30%, on /tmp more than 10% etc.
- Check single metric across multiple hosts (not yet implemented) – use a single check to check low disk space on a set of hosts defined by a regular expression e.g. instead of having separate disk checks for every host you would have a single check that would give you a break down of hosts that were not OK.
If you want to peak at how basic check_metric alert works check out Ganglia Nagios integration wiki document.
Aggregate graphs decomposition
While viewing aggregate graphs with more than 6-7 items colors will start to blend together and it may be hard to distinguish what on graph is what. This feature allows you to decompose a graph by taking every item on the aggregate graph and putting it on a separate graph e.g. a graph like this
will decompose into this
Flot client side rendering
In this release we are turning on utilization heatmaps instead of the old style pie charts e.g.
Most of the features have already been implemented. We are still polishing up the release and writing documentation. We could always use more help with testing and documenting things so if you are up to it please join us on Freenode channel #ganglia.
If you’d like to test drive some of these changes please visit our demo site.
We have just introduced an experimental new feature to our GWeb 2.0 UI that we are very excited about. Feature is called easy graph aggregation as it allows you to graph the same metric across a number of hosts. This is often useful when you are proactively looking for problems within your infrastructure. We have made the feature even more powerful by allowing you to specify a regular expression that matches multiple hosts so if all your database servers are named db-something you can simply say db as your regular expression or db-0[1-5]. This feature is experimental so if you match too many hosts you may end up with a broken image however we have decided to put it out as a preview where we are going. Obligatory screenshots
We need to add more error checking and bug fixes. Better composer UI and ability to add aggregate graphs to views. Stay tuned.
Ganglia has been around for over 10 years but it is surprising even to me that our Web Frontend has seen very little cosmetic changes over the years.
These are indeed challenging and interesting times for the project as we see a shift in the user base from the traditional High Performance Computing and Grid sites to large web 2.0 companies and companies in the Cloud space where hosts are dynamically provisioned.
For more in-depth description about this work, you can read Vladimir’s blog posts:
To play with the code, you can get it from GitHub.
We value your suggestions and feedback, so please don’t be shy and either tweet about it @gangliainfo, ping us on IRC #ganglia at irc.freenode.net or start an email thread at ganglia-developers mailing-list!
We would like to release this code soon to the public, but we need your help to implement additional features, test the code, etc. So if you are interested, please let us know!
Did you know that we are on Twitter? Follow us @gangliainfo here: http://twitter.com/gangliainfo. If you have something interesting to say about Ganglia, use the hashtag #Ganglia (be nice, we are sharing this with the Biology folks) and we just might re-tweet it! Twitter feed is also available in this webpage on the right hand side (although re-tweets are hidden in this view).
Apache Hadoop is an open-source implementation of MapReduce. Hadoop users will be happy to know that Hadoop has built-in support for publishing run-time metrics using Ganglia. For more details, visit the GangliaContext page on the Hadoop Wiki or Philip Zeyliger’s blog post on the Cloudera blog. Cloudera offers an Apache 2.0 licensed distribution to make managing Hadoop clusters easier.
John Allspaw, Engineering Manager at flickr (yahoo!), gave a talk on how flickr uses ganglia to help with capacity planning. The talk covers a lot of the subleties and challenges facing hugely successful web services like flickr.
this is basically the README.AIX file that will be in 3.0.4. It now has a better receipe for building with the native XLC compiler. It also describes what is needed to build “gmetad”. I thought it useful to publish this now.
Using Ganglia on AIX
This Version is tested on AIX 5.1, 5.2 and 5.3. AIX 4.3 might work as well,
but it’s not tested by now.
You still need some “tricks” to use ganglia on a AIX system:
1. The AIX-Version should not be compiled with shared libraries
You must add the “–disable-shared” and “–enable-static” configure
flags if you running on AIX
2. You should use “gcc”. xlc does not work out of the box. If you only have
“xlc”, the following might work. Run configure first !!
a) remove “-Wall” from all Makefiles, especially:
gmetad/Makefile (see below)
This should be done automatically, but automake/autoconf experts are
b) to actually build the binaries do:
c) To build “gmetad”, the following is needed:
c1) install the following software, preferably from RPMs:
rrdtool-1.2.11.perl56-1 (cp /opt/freeware/include/rrd.h /usr/include/ )
c2) For Gmetad-3.0.3 or earlier: there is a conflict regarding the macro
“FRAMESIZE”. In “gmetad/*” change all occurences of “FRAMESIZE” to
“GMETAD_FRAMESIZE”. This will be fixed in version 3.0.4.
Known problems and Limitations
– Occasionally Ganglia might report wrong network statistics, because
there is no test for arithmetic overflow of the AIX counters by now.
(Will be fixed soon, but might not make it in ganglia-3.0.2)
– The following standard metrics are _not_ reported (reported as 0):
mem_buffers (-), mem_shared (-), part_max_used(+), cpu_sintr
(–), cpu_intr (–), cpu_aidle (+), cpu_nice (-)
(–) cpu_nice, cpu_intr and cpu_sintr:
There is no way to include this metric, because AIX
dose not know anything about them
(-) mem_buffers and mem_shared: libperfstat does not report
his information, but maybe somebody knows another way.
(+) part_max_used and cpu_aidle: it’s quite easy to do this
metrics as well using libperfstat, but no body has written
code so far.
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
Ganglia is a BSD-licensed open-source project that grew out of the University of California, Berkeley Millennium Project which was initially funded in large part by the National Partnership for Advanced Computational Infrastructure (NPACI) and National Science Foundation RI Award EIA-9802069. NPACI is funded by the National Science Foundation and strives to advance science by creating a ubiquitous, continuous, and pervasive national computational infrastructure: the Grid. Current support comes from Planet Lab: an open platform for developing, deploying, and accessing planetary-scale services.
The Open Cluster Group is please to announce the release of OSCAR
Feature list of 4.0:
- Red Hat Linux 9, Red Hat Linux Enterprise Linux (RHEL) 3, and Fedora Core 2 support
- New RPM dependency finder helps build the server (DepMan/PackMan)
- SIS 3.3.2
- Ganglia is now included in the distribution
- Torque is now included as the default scheduler (OpenPBS can still be downloaded from OPD)
- Multiple bug fixes and Wizard improvements
This release supports both x86 and Itanium systems. Itanium support is provided by RHEL 3.
This release is available for download from the OSCAR project website: http://oscar.openclustergroup.org
People who use gexec and pcp on the latest Linux kernels will find that it hangs when executed. The problem is that Linux 2.4.x doesn’t
implement the full set of POSIX cancelation points (e.g., sem_wait,
sigwait, etc. are not implemented). This, it turns out, is the
fundamental cause for GEXEC and PCP hanging on these systems. Also,
terminal related signals (e.g., SIGTTIN) don’t appear to handled
correctly. I’m told that in 2.6.x kernels, some of these problems
have been fixed. But in the meantime, set your LD_ASSUME_KERNEL environmental variable before you start gexec daemons or clients.
In the future most (if not all) ganglia components will not rely on POSIX threads at all given the chaotic nature of threads on Linux.