[Rocks-Discuss] "Some compute nodes not accepting jobs"

Hobbick, Christopher Charles chobbick at iupui.edu
Mon Jul 19 10:12:44 PDT 2010


Thanks. That worked.  Any idea what would cause that to happen?

-----Original Message-----
From: npaci-rocks-discussion-bounces at sdsc.edu [mailto:npaci-rocks-discussion-bounces at sdsc.edu] On Behalf Of Nick Holway
Sent: Monday, July 19, 2010 12:58 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] "Some compute nodes not accepting jobs"

Can you do a quick "qstat -f" and see if there are any queues flagged with errors (ie have an E on the right). You can clear any queues with errors with "qmod -c all at compute-x-x" or "qmod -c \*" will clear the errors from all queues.

Nick

On 19 July 2010 17:34, Hobbick, Christopher Charles <chobbick at iupui.edu> wrote:
> SGE
>
> -----Original Message-----
> From: npaci-rocks-discussion-bounces at sdsc.edu 
> [mailto:npaci-rocks-discussion-bounces at sdsc.edu] On Behalf Of Bart 
> Brashers
> Sent: Monday, July 19, 2010 12:12 PM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] "Some compute nodes not accepting jobs"
>
>
> SGE or Torque/Maui?
>
> B
>
>> I have a 5.3 cluster with 6 nodes.  I've had no problems with 
>> anything
> until
>> last week when a couple jobs got stuck on 2 of the nodes.  A user had
> specified
>> in his job to run one job on compute-0-0 and the other on compute-0-1
> (he's done
>> this before with no problems.)  After the jobs never got submitted,
> the nodes
>> would no longer accept any jobs.  I've tried restarted the nodes,
> restarted the
>> sge service on the head node, and rebooting the whole cluster, but
> nothing seems
>> to work.  The other nodes accept jobs just fine, just not compute-0-0
> and
>> compute-0-1.
>>
>> Thanks
>> Chris
>
>
> This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to email at environcorp.com and immediately delete all copies of the message.
>


More information about the npaci-rocks-discussion mailing list