[Rocks-Discuss]pbsnodes offline status problem

Yong Duan yduan at udel.edu
Mon Apr 14 13:19:07 PDT 2003


We still use ROCKS 2.2 which may explain ... But I cross my hear that it
happens to us ...

yong
-----Original Message-----
From: Federico Sacerdoti [mailto:fds at sdsc.edu] 
Sent: Monday, April 14, 2003 3:58 PM
To: Yong Duan
Cc: 'Cody Hammock'; npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]pbsnodes offline status problem



On Monday, April 14, 2003, at 07:31 AM, Yong Duan wrote:

> 1) Identify those down nodes by this space-age high-tech "ping". The 
> most reliable way is to sequentially ping every node and identify 
> those that fail to respond. You can do this by a script of about 5 
> lines. Forget about ganglia. It never worked for me. It likes you to 
> believe that machines are perfectly happy when those same machines are

> dead for years.

I dont see how a machine could participate in the ganglia network while 
not actually being "alive". There was a problem with the ganglia 
webpage in the past that caused dead nodes to be incorrectly displayed, 
but that has been long fixed. I have not seen the type of false 
positives (ganglia showing a healthy node when it is dead) you 
describe. Has anyone else?

Federico

Rocks Cluster Group, SDSC, San Diego, CA




More information about the npaci-rocks-discussion mailing list