[Rocks-Discuss] ganglia not showing queue

Steven Berler berler at gmail.com
Mon Jun 23 14:17:11 PDT 2008


Gmetrics appears to be working.  We used to have rocks 4.2 installed and all
this was working correctly then.    On the Gmetrics page I see a table with
lots of information on cpu, mem, usage etc.  Heres the first few lines of
it:

14120120 compute-0-16.local ps-29342cmd=python, user=art, %cpu=98.11,
%mem=2.51, size=4, data=50812, shared=1284, vm=56012  14120120
compute-0-16.local ps-29963cmd=python, user=art, %cpu=99.90, %mem=0.98,
size=4, data=19136, shared=1284, vm=24300  20120120 compute-0-15.local
ps-31579cmd=discovery.exe, user=accelrys, %cpu=99.90, %mem=0.61, size=3900,
data=8392, shared=4284, vm=65188  20120120 compute-0-15.local
ps-1189cmd=python,
user=art, %cpu=98.78, %mem=1.52, size=4, data=30172, shared=1292, vm=35328
21120120 compute-0-6.local ps-29482cmd=python, user=art, %cpu=99.44,
%mem=1.40, size=4, data=27820, shared=1300, vm=32984  21120120
compute-0-6.local ps-28716cmd=python, user=art, %cpu=99.44, %mem=2.47,
size=4, data=50068, shared=1300, vm=55280  22120120 compute-0-0.local
ps-25730cmd=python, user=art, %cpu=99.90, %mem=2.51, size=4, data=50812,
shared=1296, vm=56020  22120120 compute-0-0.local ps-28514cmd=python,
user=art, %cpu=99.90, %mem=1.40, size=4, data=27848, shared=1296, vm=33000
24120120 compute-0-23.local ps-29382cmd=python, user=art, %cpu=99.37,
%mem=1.12, size=4, data=21996, shared=1292, vm=27160

The other thing thats very odd is that in the Job Queue it says "0 Active
Jobs. 0 of 92 Processors Active (0.00%) " But our cluster only has 50
processors total (including the frontend), and it correctly shows the number
of processors on the ganglia main page.  It seems like the "92" processors
is coming from how our cluster was configured before when we were running
rocks 4.2, but now we're not using as many compute nodes.

Another odd thing is one of the compute nodes, compute-0-4 shows up on
ganglia with cpu usage etc but SGE does not have any records for
compute-0-4, and it will not queue up anything on it. (output of qstat -f
shows all the compute nodes except -0-4).  I'm currently reinstalling rocks
on compute-0-4 to see if at least SGE will recognize it.

-Steven

On Mon, Jun 23, 2008 at 1:55 PM, Greg Bruno <greg.bruno at gmail.com> wrote:

> On Mon, Jun 23, 2008 at 11:51 AM, Steven Berler <sberler at hmc.edu> wrote:
> > [berler at tealia ~]$ rocks list roll
> > NAME        VERSION ARCH ENABLED
> > base:       5.0     i386 yes
> > bio:        5.0     i386 yes
> > ganglia:    5.0     i386 yes
> > hpc:        5.0     i386 yes
> > java:       5.0     i386 yes
> > kernel:     5.0     i386 yes
> > os:         5.0     i386 yes
> > sge:        5.0     i386 yes
> > web-server: 5.0     i386 yes
>
> what happens when you click the 'Gmetrics' link?
>
> if there is no output, then it appears your switch is not allowing
> multicast traffic. make sure that 'IGMP snooping' is enabled on your
> switch.
>
>  - gb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20080623/72b0ab6b/attachment.html 


More information about the npaci-rocks-discussion mailing list