Archive for category FYI
Monitoring Hadoop Clusters with Ganglia
Posted by Matt Massie in FYI on April 22, 2009
Apache Hadoop is an open-source implementation of MapReduce. Hadoop users will be happy to know that Hadoop has built-in support for publishing run-time metrics using Ganglia. For more details, visit the GangliaContext page on the Hadoop Wiki or Philip Zeyliger’s blog post on the Cloudera blog. Cloudera offers an Apache 2.0 licensed distribution to make managing Hadoop clusters easier.
Slides from ‘Capacity Planning for LAMP’ talk at MySQL Conf 2007
Posted by Matt Massie in FYI on April 28, 2007
John Allspaw, Engineering Manager at flickr (yahoo!), gave a talk on how flickr uses ganglia to help with capacity planning. The talk covers a lot of the subleties and challenges facing hugely successful web services like flickr.
Building on AIX using the native compiler
Hi,
this is basically the README.AIX file that will be in 3.0.4. It now has a better receipe for building with the native XLC compiler. It also describes what is needed to build “gmetad”. I thought it useful to publish this now.
Using Ganglia on AIX
~~~~~~~~~~~~~~~~~~~~
This Version is tested on AIX 5.1, 5.2 and 5.3. AIX 4.3 might work as well,
but it’s not tested by now.
Installation
~~~~~~~~~~~~
You still need some “tricks” to use ganglia on a AIX system:
1. The AIX-Version should not be compiled with shared libraries
You must add the “–disable-shared” and “–enable-static” configure
flags if you running on AIX
2. You should use “gcc”. xlc does not work out of the box. If you only have
“xlc”, the following might work. Run configure first !!
a) remove “-Wall” from all Makefiles, especially:
gmond/gstat/Makefile
gmond/Makefile
gmetric/Makefile
gmetad/Makefile (see below)
This should be done automatically, but automake/autoconf experts are
needed.
b) to actually build the binaries do:
c) To build “gmetad”, the following is needed:
c1) install the following software, preferably from RPMs:
freetype2-devel-2.1.7-2
zlib-1.2.2-4
libpng-1.2.1-6
freetype2-2.1.7-2
libart_lgpl-2.3.16-1
rrdtool-1.2.11.perl56-1 (cp /opt/freeware/include/rrd.h /usr/include/ )
c2) For Gmetad-3.0.3 or earlier: there is a conflict regarding the macro
“FRAMESIZE”. In “gmetad/*” change all occurences of “FRAMESIZE” to
“GMETAD_FRAMESIZE”. This will be fixed in version 3.0.4.
Known problems and Limitations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
– Occasionally Ganglia might report wrong network statistics, because
there is no test for arithmetic overflow of the AIX counters by now.
(Will be fixed soon, but might not make it in ganglia-3.0.2)
– The following standard metrics are _not_ reported (reported as 0):
mem_buffers (-), mem_shared (-), part_max_used(+), cpu_sintr
(–), cpu_intr (–), cpu_aidle (+), cpu_nice (-)
(–) cpu_nice, cpu_intr and cpu_sintr:
There is no way to include this metric, because AIX
dose not know anything about them
(-) mem_buffers and mem_shared: libperfstat does not report
his information, but maybe somebody knows another way.
(+) part_max_used and cpu_aidle: it’s quite easy to do this
metrics as well using libperfstat, but no body has written
code so far.
What is Ganglia?
Posted by Matt Massie in FYI on November 9, 2005
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
Ganglia is a BSD-licensed open-source project that grew out of the University of California, Berkeley Millennium Project which was initially funded in large part by the National Partnership for Advanced Computational Infrastructure (NPACI) and National Science Foundation RI Award EIA-9802069. NPACI is funded by the National Science Foundation and strives to advance science by creating a ubiquitous, continuous, and pervasive national computational infrastructure: the Grid. Current support comes from Planet Lab: an open platform for developing, deploying, and accessing planetary-scale services.
Ganglia is part of OSCAR 4.0
Posted by Matt Massie in FYI on January 11, 2005
The Open Cluster Group is please to announce the release of OSCAR
version 4.0.
Feature list of 4.0:
- Red Hat Linux 9, Red Hat Linux Enterprise Linux (RHEL) 3, and Fedora Core 2 support
- New RPM dependency finder helps build the server (DepMan/PackMan)
- SIS 3.3.2
- Ganglia is now included in the distribution
- Torque is now included as the default scheduler (OpenPBS can still be downloaded from OPD)
- Multiple bug fixes and Wizard improvements
This release supports both x86 and Itanium systems. Itanium support is provided by RHEL 3.
This release is available for download from the OSCAR project website: http://oscar.openclustergroup.org
Linux POSIX Threads
Posted by Matt Massie in FYI on September 16, 2004
People who use gexec and pcp on the latest Linux kernels will find that it hangs when executed. The problem is that Linux 2.4.x doesn’t
implement the full set of POSIX cancelation points (e.g., sem_wait,
sigwait, etc. are not implemented). This, it turns out, is the
fundamental cause for GEXEC and PCP hanging on these systems. Also,
terminal related signals (e.g., SIGTTIN) don’t appear to handled
correctly. I’m told that in 2.6.x kernels, some of these problems
have been fixed. But in the meantime, set your LD_ASSUME_KERNEL environmental variable before you start gexec daemons or clients.
export LD_ASSUME_KERNEL="2.4.10"
In the future most (if not all) ganglia components will not rely on POSIX threads at all given the chaotic nature of threads on Linux.
