Archive for category FYI

Upcoming Ganglia Web features

We have been working hard on new Ganglia Web features that will be part of Ganglia Web 2.2.0. These are the highlights

Compare Hosts

Allows you to compare hosts across all the matching metrics (this can mean hundreds of graphs :-) ). You supply a regular expression that matches a set of the hosts and Ganglia will aggregate all hosts for each metric. This is useful in those cases where you are trying to find why a particular host or hosts are performing differently then another set.

Compare Hosts - Ganglia

Built-in Nagios integration

This feature allows you to use your Ganglia trending data to alert in Nagios. There a couple nice addition to the basic check functionality e.g.

  1. Check heartbeat – as you may know gmond daemons sends a periodic heartbeat (every 20 seconds by default). If the heartbeat is missing it is fair to assume host is down. This should avoid you from having to use things like check_ping and alert you to potential down time much quicker
  2. Check multiple metrics – allows you to use a single check to multiple metrics on the same host ie. check that disk free on / is more than 30%, on /tmp more than 10% etc.
  3. Check single metric across multiple hosts (not yet implemented) – use a single check to check low disk space on a set of hosts defined by a regular expression e.g. instead of having separate disk checks for every host you would have a single check that would give you a break down of hosts that were not OK.

If you want to peak at how basic check_metric alert works check out Ganglia Nagios integration wiki document.

Aggregate graphs decomposition

While viewing aggregate graphs with more than 6-7 items colors will start to blend together and it may be hard to distinguish what on graph is what. This feature allows you to decompose a graph by taking every item on the aggregate graph and putting it on a separate graph e.g. a graph like this

Aggregate Graph - Ganglia

will decompose into this

Aggregate graph decomposition

Flot client side rendering

We have been using flot a Javascript graphing library for a while now. In this release we are planning to make it even more interactive ie. take items of graph dynamically etc.

Utilization heatmaps

In this release we are turning on utilization heatmaps instead of the old style pie charts e.g.

heatmap

Most of the features have already been implemented. We are still polishing up the release and writing documentation. We could always use more help with testing and documenting things so if you are up to it please join us on Freenode channel #ganglia.

If you’d like to test drive some of these changes please visit our demo site.

No Comments

Easy graph aggregation

We have just introduced an experimental new feature to our GWeb 2.0 UI that we are very excited about. Feature is called easy graph aggregation as it allows you to graph the same metric across a number of hosts. This is often useful when you are proactively looking for problems within your infrastructure. We have made the feature even more powerful by allowing you to specify a regular expression that matches multiple hosts so if all your database servers are named db-something you can simply say db as your regular expression or db-0[1-5]. This feature is experimental so if you match too many hosts you may end up with a broken image however we have decided to put it out as a preview where we are going. Obligatory screenshots

Line graph

Easy Graph Aggregation Line graph

Stacked graph

Easy Aggregate Stack Graph

Next steps

We need to add more error checking and bug fixes. Better composer UI and ability to add aggregate graphs to views. Stay tuned.

If you’d like to play with you can try it on our demo server. You can also read more about GWeb 2.0 and how to download it here.

No Comments

Gweb 2.0

Ganglia has been around for over 10 years but it is surprising even to me that our Web Frontend has seen very little cosmetic changes over the years.

Back in October 2010, I started an email thread in the ganglia-developers mailing-list to kickstart a “re-write” of the frontend code. The idea is to make use of javascript libraries to improve on the user experience and allow customizations to cater to individual needs. We also wanted to tackle issues like visualizing a lot of data which large sites managing tens of thousands of computers are increasingly facing. These sites also tend to track upwards of hundreds of metrics per hosts bringing total metrics monitored in the range of millions.

These are indeed challenging and interesting times for the project as we see a shift in the user base from the traditional High Performance Computing and Grid sites to large web 2.0 companies and companies in the Cloud space where hosts are dynamically provisioned.

After months of coding, we have something to show. This is namely the effort of Vladimir Vuksan, Erik Kastner, John Goulah and Alex Dean. A demo of the new frontend can be seen here.

(Thanks Joyent for hosting and Andy Cobaugh from Penn State University, Center for Comparative Genomics and Bioinformatics for providing access to their gmond metrics data)

For more in-depth description about this work, you can read Vladimir’s blog posts:

http://vuksan.com/blog/2010/12/10/rethinking-ganglia-web-ui
http://vuksan.com/blog/2011/02/20/json-representation-for-graphs-in-ganglia

To play with the code, you can get it from GitHub.

We value your suggestions and feedback, so please don’t be shy and either tweet about it @gangliainfo, ping us on IRC #ganglia at irc.freenode.net or start an email thread at ganglia-developers mailing-list!

We would like to release this code soon to the public, but we need your help to implement additional features, test the code, etc. So if you are interested, please let us know!

,

No Comments

Got Tweets?

Did you know that we are on Twitter? Follow us @gangliainfo here: http://twitter.com/gangliainfo. If you have something interesting to say about Ganglia, use the hashtag #Ganglia (be nice, we are sharing this with the Biology folks) and we just might re-tweet it! Twitter feed is also available in this webpage on the right hand side (although re-tweets are hidden in this view).

Happy Twittering/Tweeting!

No Comments

Monitoring Hadoop Clusters with Ganglia

Apache Hadoop is an open-source implementation of MapReduce. Hadoop users will be happy to know that Hadoop has built-in support for publishing run-time metrics using Ganglia. For more details, visit the GangliaContext page on the Hadoop Wiki or Philip Zeyliger’s blog post on the Cloudera blog. Cloudera offers an Apache 2.0 licensed distribution to make managing Hadoop clusters easier.

, ,

No Comments

Slides from ‘Capacity Planning for LAMP’ talk at MySQL Conf 2007

John Allspaw, Engineering Manager at flickr (yahoo!), gave a talk on how flickr uses ganglia to help with capacity planning. The talk covers a lot of the subleties and challenges facing hugely successful web services like flickr.

No Comments

Building on AIX using the native compiler

Hi,

this is basically the README.AIX file that will be in 3.0.4. It now has a better receipe for building with the native XLC compiler. It also describes what is needed to build “gmetad”. I thought it useful to publish this now.

Using Ganglia on AIX
~~~~~~~~~~~~~~~~~~~~

This Version is tested on AIX 5.1, 5.2 and 5.3. AIX 4.3 might work as well,
but it’s not tested by now.

Installation
~~~~~~~~~~~~

You still need some “tricks” to use ganglia on a AIX system:

1. The AIX-Version should not be compiled with shared libraries
You must add the “–disable-shared” and “–enable-static” configure
flags if you running on AIX

./configure –disable-shared –enable-static

2. You should use “gcc”. xlc does not work out of the box. If you only have
“xlc”, the following might work. Run configure first !!

a) remove “-Wall” from all Makefiles, especially:

lib/Makefile
gmond/gstat/Makefile
gmond/Makefile
gmetric/Makefile
gmetad/Makefile (see below)

This should be done automatically, but automake/autoconf experts are
needed.

b) to actually build the binaries do:

make CC=”cc -qlanglvl=extc99″

c) To build “gmetad”, the following is needed:
c1) install the following software, preferably from RPMs:

libart_lgpl-devel-2.3.16-1
freetype2-devel-2.1.7-2
zlib-1.2.2-4
libpng-1.2.1-6
freetype2-2.1.7-2
libart_lgpl-2.3.16-1
rrdtool-1.2.11.perl56-1 (cp /opt/freeware/include/rrd.h /usr/include/ )

c2) For Gmetad-3.0.3 or earlier: there is a conflict regarding the macro
“FRAMESIZE”. In “gmetad/*” change all occurences of “FRAMESIZE” to
“GMETAD_FRAMESIZE”. This will be fixed in version 3.0.4.

Known problems and Limitations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

– Occasionally Ganglia might report wrong network statistics, because
there is no test for arithmetic overflow of the AIX counters by now.
(Will be fixed soon, but might not make it in ganglia-3.0.2)

– The following standard metrics are _not_ reported (reported as 0):
mem_buffers (-), mem_shared (-), part_max_used(+), cpu_sintr
(–), cpu_intr (–), cpu_aidle (+), cpu_nice (-)

(–) cpu_nice, cpu_intr and cpu_sintr:
There is no way to include this metric, because AIX
dose not know anything about them

(-) mem_buffers and mem_shared: libperfstat does not report
his information, but maybe somebody knows another way.

(+) part_max_used and cpu_aidle: it’s quite easy to do this
metrics as well using libperfstat, but no body has written
code so far.

No Comments

What is Ganglia?

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.

Ganglia is a BSD-licensed open-source project that grew out of the University of California, Berkeley Millennium Project which was initially funded in large part by the National Partnership for Advanced Computational Infrastructure (NPACI) and National Science Foundation RI Award EIA-9802069. NPACI is funded by the National Science Foundation and strives to advance science by creating a ubiquitous, continuous, and pervasive national computational infrastructure: the Grid. Current support comes from Planet Lab: an open platform for developing, deploying, and accessing planetary-scale services.

No Comments

Ganglia is part of OSCAR 4.0

The Open Cluster Group is please to announce the release of OSCAR
version 4.0.

Feature list of 4.0:

  • Red Hat Linux 9, Red Hat Linux Enterprise Linux (RHEL) 3, and Fedora Core 2 support
  • New RPM dependency finder helps build the server (DepMan/PackMan)
  • SIS 3.3.2
  • Ganglia is now included in the distribution
  • Torque is now included as the default scheduler (OpenPBS can still be downloaded from OPD)
  • Multiple bug fixes and Wizard improvements

This release supports both x86 and Itanium systems. Itanium support is provided by RHEL 3.

This release is available for download from the OSCAR project website: http://oscar.openclustergroup.org

No Comments

Linux POSIX Threads

People who use gexec and pcp on the latest Linux kernels will find that it hangs when executed. The problem is that Linux 2.4.x doesn’t
implement the full set of POSIX cancelation points (e.g., sem_wait,
sigwait, etc. are not implemented). This, it turns out, is the
fundamental cause for GEXEC and PCP hanging on these systems. Also,
terminal related signals (e.g., SIGTTIN) don’t appear to handled
correctly. I’m told that in 2.6.x kernels, some of these problems
have been fixed. But in the meantime, set your LD_ASSUME_KERNEL environmental variable before you start gexec daemons or clients.

export LD_ASSUME_KERNEL="2.4.10"

In the future most (if not all) ganglia components will not rely on POSIX threads at all given the chaotic nature of threads on Linux.

No Comments