[Rocks-Discuss]Newbie question: alternative method for installing compute nodes with two NICs?

Wheeler, Dr M.D. mdw10 at leicester.ac.uk
Tue Jan 25 02:52:27 PST 2005


Oh dear,
I have deleted some of the extra nics using add-extra-nic --del 

they are no longer there in add-extra-nic --list compute-0-0
 root]# add-extra-nic --list compute-0-0
------------------------------------------------------
|                    compute-0-0                     |
------------------------------------------------------
| Adapter |       IP       |  Netmask  |     Name    |
------------------------------------------------------
|    eth0 | 10.255.255.254 | 255.0.0.0 | compute-0-0 |
------------------------------------------------------

but are still in the /etc/hosts file

root]# more /etc/hosts
#
# Do NOT Edit (generated by dbreport)
#
127.0.0.1       localhost.localdomain   localhost
*.*.*.*         *.*.*.*.*
10.1.1.1        marvin.local marvin # originally frontend-0-0
10.255.255.254  compute-0-0.local compute-0-0 c0-0
10.255.255.253  compute-0-1.local compute-0-1 c0-1
10.255.255.252  compute-0-2.local compute-0-2 c0-2
10.255.255.251  32-bit-compute-0-0.local 32-bit-compute-0-0
192.168.1.1     mpi-0-0
192.168.1.1     mpi-0-0
192.168.1.2     mpi-0-1
192.168.1.2     mpi-0-1
192.168.1.3     mpi-0-2
192.168.1.3     mpi-0-2

can I edit this file by hand or is there some way of updating it??

Thanks

Martyn

----------------------------------------------
Dr. Martyn D. Wheeler
Department of Chemistry
University of Leicester
University Road
Leicester, LE1 7RH, UK.
Tel (office): +44 (0)116 252 3985
Tel (lab):    +44 (0)116 252 2115
Fax:          +44 (0)116 252 3789
Email:        martyn.wheeler at le.ac.uk
http://www.le.ac.uk/chemistry/staff/mdw10.html
 

> -----Original Message-----
> From: npaci-rocks-discussion-admin at sdsc.edu
> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of 
> Marcelo Matus
> Sent: 25 January 2005 09:06
> To: Wheeler, Dr M.D.
> Cc: mmatus at dinha.acms.arizona.edu; Huiqun Zhou;
> npaci-rocks-discussion at sdsc.edu
> Subject: Re: [Rocks-Discuss]Newbie question: alternative method for
> installing compute nodes with two NICs?
> 
> 
> I think the problem is the explicit gateway you gave it to eth1.
> 
> I would delete the extra NIC using
> 
>   add-extra-nic --del  ...
> 
> or using the Web interface, and add it again without the 
> gateway option.
> 
> And here is how you use or test them:
> 
> 1.- You need two nodes with the extra NIC working (without 
> the gateway 
> option).
> 
>       compute-0-0 (eth0) + compute-mpi-0-0 (eth1)
>       compute-0-1 (eth0) + compute-mpi-0-1 (eth1)
> 
> 2.- If everything works, from compute-0-0 (not the frontend) 
> you should 
> be able to do
> 
>       $ ping compute-0-1
>       $ ping compute-mpi-0-1
> 
> 3.- Yes, sorry, if you want to do a test from the frontend, 
> you need to 
> have three interfaces.
> 
>              eth0 -> cluster
>              eth1 -> external network
>              eth2 -> 192.168.1.xxx (secondary cluster network)
>     
>      or add a specific route/gateway for the 192.168.1.0 network.
>      But again, this is not needed if you want to use all the 
> secondary 
> NICs in the cluster
>      for MPI.
> 
>      In fact, if you add the secondary NIC properly, and from the 
> frontend you try
> 
>       $ ping compute-mpi-0-0
> 
>      that will result in a ping using the normal eth0 address, since 
> from the two addresses
>      in the database, the frontend will choose the one for 
> which it has 
> an interface, ie, eth0.
> 
> Marcelo
> 
> 
> 
> 
> Wheeler, Dr M.D. wrote:
> 
> >My frontend does have two NICs but one is being used to 
> connect to the outside world, and the other is connected to 
> the private rocks network.  Do I need another NIC in my 
> frontend to talk to the 192.168.*.* network?
> >
> >Sorry, but I am a little lost here...
> >
> >BTW, when I do a shoot-node after add-extra-nic I see the usual:
> >
> >root]# shoot-node compute-0-1
> >Shutting down kernel logger: [  OK  ]
> >Shutting down system logger: [  OK  ]
> >[compute-0-1] waiting for machine to go down
> >[compute-0-1] waiting for machine to go down
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] waiting for machine to come up
> >[compute-0-1] launching xterm
> >
> >But then I get
> >
> >X connection to localhost:12.0 broken (explicit kill or 
> server shutdown).
> >
> >is this correct? Or is this because my frontend doesn't have 
> the correct routing on it??
> >
> >Thanks
> >Martyn
> >
> >
> >----------------------------------------------
> >Dr. Martyn D. Wheeler
> >Department of Chemistry
> >University of Leicester
> >University Road
> >Leicester, LE1 7RH, UK.
> >Tel (office): +44 (0)116 252 3985
> >Tel (lab):    +44 (0)116 252 2115
> >Fax:          +44 (0)116 252 3789
> >Email:        martyn.wheeler at le.ac.uk
> >http://www.le.ac.uk/chemistry/staff/mdw10.html
> > 
> >
> >  
> >
> >>-----Original Message-----
> >>From: npaci-rocks-discussion-admin at sdsc.edu
> >>[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of 
> >>Marcelo Matus
> >>Sent: 24 January 2005 22:29
> >>To: Wheeler, Dr M.D.
> >>Cc: mmatus at dinha.acms.arizona.edu; Huiqun Zhou;
> >>npaci-rocks-discussion at sdsc.edu
> >>Subject: Re: [Rocks-Discuss]Newbie question: alternative method for
> >>installing compute nodes with two NICs?
> >>
> >>
> >>You don't need to add the gateway, at least you really have one,
> >>but you probably also need to add the secondary NIC to the frontend
> >>in the same network segment, ie, something like 192.128.1.100
> >>
> >>If your frontend doesn't have two NICs, then still the compute nodes
> >>will talk between them, but not the frontend.
> >>
> >>If your frontend doesn't have two NICs, and still you want to talk
> >>to the compute nodes through the secondary NICs from the frontend,
> >>then you will have to set one a gateway and add the route to 
> >>the frontend.
> >>You can use on the compute nodes as gateway for that porpuses.
> >>
> >>Marcelo
> >>
> >>
> >>Wheeler, Dr M.D. wrote:
> >>
> >>    
> >>
> >>>Dear Rock users
> >>>
> >>>Ok I have installed an extra nic on one of my compute nodes 
> >>>      
> >>>
> >>with the line
> >>    
> >>
> >>>add-extra-nic --if=eth1 --ip=192.168.1.1 
> >>>      
> >>>
> >>--netmask=255.255.255.0 --gateway=192.168.1.254 
> >>--name=mpi-0-0 compute-0-0
> >>    
> >>
> >>>and I have done a shoot-node on compute-0-0
> >>>
> >>>I then try and login to mpi-0-0 using ssh mpi-0-0 I get 
> >>>      
> >>>
> >>nothing, likewise with ping, and if I use the IP Address.
> >>    
> >>
> >>>I think (though I could be wrong (as is often the case))  
> >>>      
> >>>
> >>that the problem lies in the definiation of the gateway, does 
> >>my frontend know how to connect to this subnet??
> >>    
> >>
> >>>Any suggestions most welcome....
> >>>
> >>>Thanks in advance,
> >>>
> >>>Martyn
> >>>
> >>>
> >>>________________________________
> >>>
> >>>From: npaci-rocks-discussion-admin at sdsc.edu on behalf of 
> >>>      
> >>>
> >>Marcelo Matus
> >>    
> >>
> >>>Sent: Fri 21/01/2005 23:38
> >>>To: Wheeler, Dr M.D.
> >>>Cc: mmatus at dinha.acms.arizona.edu; Huiqun Zhou; 
> >>>      
> >>>
> >>npaci-rocks-discussion at sdsc.edu
> >>    
> >>
> >>>Subject: Re: [Rocks-Discuss]Newbie question: alternative 
> >>>      
> >>>
> >>method for installing compute nodes with two NICs?
> >>    
> >>
> >>>
> >>>Netpipe is here
> >>>
> >>>  http://www.scl.ameslab.gov/netpipe/
> >>>
> >>>And a I nice experiment you can do, and then report to the 
> >>>      
> >>>
> >>list, is run
> >>    
> >>
> >>>two Netpipe instances using eth0 and eth1 between two nodes. Since
> >>>you have Tyan opteron mb, the total sum of the bandwidth 
> >>>      
> >>>
> >>should around
> >>    
> >>
> >>>the double.
> >>>
> >>>and I guess if you add the extra NICs using add-extra-nic, and then
> >>>shoot-node, it
> >>>should work. It seems that is the way the other Rocks user 
> >>>      
> >>>
> >>was doing it,
> >>    
> >>
> >>>ie, installing
> >>>twice each node.
> >>>
> >>>about other twiks?, if you call your nodes
> >>>
> >>>
> >>> compute0-0 (eth0)  -> compute-mpi-0-0 (eth1)
> >>>
> >>>then you must be sure that when running mpi, the hostname 
> used in the
> >>>command
> >>>line are 'compute-mpi-x-x'. You can maybe change/add new sge 
> >>>      
> >>>
> >>queues to use
> >>    
> >>
> >>>the right name, or write a scripts that translate the 
> names properly.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>Wheeler, Dr M.D. wrote:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>I do have tyan opteron machines with dual onboard NIC. So 
> >>>>        
> >>>>
> >>this might be worth a go.
> >>    
> >>
> >>>>So all I have to do is run add-extra-nic and reinstall each 
> >>>>        
> >>>>
> >>node (can I do a shoot-node after add extra-nic or do I need 
> >>to do a full install)?
> >>    
> >>
> >>>>Is there any other tweaking that needs to be done for:
> >>>>
> >>>>
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>you can use the primary NICs in
> >>>>>a network to support all the normal cluster services like 
> >>>>>          
> >>>>>
> >>PVFS, NFS,
> >>    
> >>
> >>>>>etc, while leaving
> >>>>>the communications (MPI) to the secondary NICs.
> >>>>> 
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>Cheers Martyn
> >>>>
> >>>>BTW where do I get NetPipe?
> >>>>
> >>>>________________________________
> >>>>
> >>>>From: npaci-rocks-discussion-admin at sdsc.edu on behalf of 
> >>>>        
> >>>>
> >>Marcelo Matus
> >>    
> >>
> >>>>Sent: Fri 21/01/2005 20:48
> >>>>To: Wheeler, Dr M.D.
> >>>>Cc: mmatus at dinha.acms.arizona.edu; Huiqun Zhou; 
> >>>>        
> >>>>
> >>npaci-rocks-discussion at sdsc.edu
> >>    
> >>
> >>>>Subject: Re: [Rocks-Discuss]Newbie question: alternative 
> >>>>        
> >>>>
> >>method for installing compute nodes with two NICs?
> >>    
> >>
> >>>>
> >>>>If your machines have the NICs connected to the right 
> bus, ie, PCI-X
> >>>>(Tyan Opteron)
> >>>>or PCI-Express (don't know which one), you can use the 
> >>>>        
> >>>>
> >>primary NICs in
> >>    
> >>
> >>>>a network to support all the normal cluster services like 
> PVFS, NFS,
> >>>>etc, while leaving
> >>>>the communications (MPI) to the secondary NICs.
> >>>>
> >>>>Then you can be running MPI and accessing files without 
> >>>>        
> >>>>
> >>compiting for
> >>    
> >>
> >>>>network access,
> >>>>ie, you get a larger bandwith.
> >>>>
> >>>>Now, if yours NICs are connected or using the plain PCI 
> >>>>        
> >>>>
> >>bus, well, there
> >>    
> >>
> >>>>is no much
> >>>>gain since the PCI bus is already saturated with only on 
> >>>>        
> >>>>
> >>Gigabit NIC.
> >>    
> >>
> >>>>How do you now where yor NICs are connected?, run NetPipe 
> >>>>        
> >>>>
> >>between two nodes,
> >>    
> >>
> >>>>and if the maximum bandwith you get is
> >>>>
> >>>>200-300 Mbps   ->  PCI 32 bits
> >>>>500-600 Mbps   ->  PCI 64 bits
> >>>>800-900 Mbps   ->  PCI-X or better
> >>>>
> >>>>then, in the last case, it makes a lot of sense to enable 
> >>>>        
> >>>>
> >>the secondary
> >>    
> >>
> >>>>NICs, since
> >>>>two Gigabits connection don't saturate the PCI-X bus, ie, 
> >>>>        
> >>>>
> >>you can have
> >>    
> >>
> >>>>in theory
> >>>>double the maximum bandwith in your nodes. Now, how you can 
> >>>>        
> >>>>
> >>really use that
> >>    
> >>
> >>>>bandwidth, is up to you, but splitting services between 
> >>>>        
> >>>>
> >>them (file disk
> >>    
> >>
> >>>>access versus
> >>>>MPI communications) is a modest way to start.
> >>>>
> >>>>
> >>>>Marcelo
> >>>>
> >>>>
> >>>>Wheeler, Dr M.D. wrote:
> >>>>
> >>>>
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>Dear All,
> >>>>>I was just wondering if someone could explain what is the 
> >>>>>          
> >>>>>
> >>advantage of using both NICs on a node?  Can it speed up 
> >>parallel jobs.  At present I have only one NIC set up on each 
> >>of my compute nodes and it would be great if adding the other 
> >>helped speed things up
> >>    
> >>
> >>>>>Martyn
> >>>>>
> >>>>>________________________________
> >>>>>
> >>>>>From: npaci-rocks-discussion-admin at sdsc.edu on behalf of 
> >>>>>          
> >>>>>
> >>Marcelo Matus
> >>    
> >>
> >>>>>Sent: Fri 21/01/2005 19:18
> >>>>>Cc: Huiqun Zhou; npaci-rocks-discussion at sdsc.edu
> >>>>>Subject: Re: [Rocks-Discuss]Newbie question: alternative 
> >>>>>          
> >>>>>
> >>method for installing compute nodes with two NICs?
> >>    
> >>
> >>>>>
> >>>>>here is how we installed our cluster nodes with two 
> >>>>>          
> >>>>>
> >>ethernet interfaces,
> >>    
> >>
> >>>>>where one of the interfaces is infiniband, we add to our 
> >>>>>          
> >>>>>
> >>extend-compute.xml
> >>    
> >>
> >>>>>file something  like this:
> >>>>>
> >>>>><!-- Configure ipoib0 -->
> >>>>><post>                                                     
> >>>>>          
> >>>>>
> >>                       
> >>    
> >>
> >>>>>#
> >>>>># IPoIB configuration
> >>>>>#
> >>>>>ibbase=192.168.100
> >>>>>ibnetmask=255.255.255.0
> >>>>>ibbroadcast=192.168.100.255
> >>>>>ibmtu=1500
> >>>>>                                                          
> >>>>>          
> >>>>>
> >>                
> >>    
> >>
> >>>>># get eth0 ip address
> >>>>>ipeth0=<var name="Node_Address"/>
> >>>>>ipnumber=`echo $ipeth0 | tr '\.' ' '|awk '{print $4}'`
> >>>>>                                                          
> >>>>>          
> >>>>>
> >>                
> >>    
> >>
> >>>>># We revert ipnumber: 254 -> 1
> >>>>>ibnumber=`expr 255 - $ipnumber`
> >>>>>                                                          
> >>>>>          
> >>>>>
> >>             
> >>    
> >>
> >>>>>ib-setup --config --ip $ibbase.$ibnumber --netmask $ibnetmask
> >>>>>--broadcast $ibbroadcast --mtu $ibmtu
> >>>>>                                                          
> >>>>>          
> >>>>>
> >>                
> >>    
> >>
> >>>>></post>
> >>>>>
> >>>>>
> >>>>>As you see, you we retreive the $ipeth0 address and 
> fabricate a new
> >>>>>address for ipoib0, then
> >>>>>we set the new address using ib-setup.
> >>>>>
> >>>>>In your case, you can di the same, retreive the $ipeth0 address,
> >>>>>fabricate an address for eth1,
> >>>>>assigned using ifconfig, and also fabricate the file
> >>>>>
> >>>>>/etc/sysconfig/network-scripts/ifcfg-eth1
> >>>>>
> >>>>>
> >>>>>then in the frontend you can run a small script,  to 
> >>>>>          
> >>>>>
> >>capture all the
> >>    
> >>
> >>>>>eth1 addresses
> >>>>>and add them into the cluster database using add-extra-nic
> >>>>>
> >>>>>cluster-fork ifconfig eth1 | add-all-eth1-nics
> >>>>>
> >>>>>where you need to write the 'add-all-eth1-nics' script 
> to parse the
> >>>>>cluster-fork/ifconfig output
> >>>>>and call add-extra-nic properly.
> >>>>>
> >>>>>If not, you can add the eth1 address manually with 
> >>>>>          
> >>>>>
> >>'add-extra-nic', as
> >>    
> >>
> >>>>>we are doing by now since
> >>>>>we have very few nodes in our testing stage.
> >>>>>
> >>>>>Anyway, using add-extra-nic manually or not, you don't 
> >>>>>          
> >>>>>
> >>need to reinstall
> >>    
> >>
> >>>>>the compute nodes twice.
> >>>>>
> >>>>>Marcelo
> >>>>>
> >>>>>
> >>>>>Mason J. Katz wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> 
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>This is correct, and we feel your pain.  But for the 
> >>>>>>            
> >>>>>>
> >>current release
> >>    
> >>
> >>>>>>the only way to enable the second NIC is to install once, run
> >>>>>>add-extra-nic and install again.  This will hopefully 
> >>>>>>            
> >>>>>>
> >>change in the
> >>    
> >>
> >>>>>>not so distant future (but not the next release).
> >>>>>>
> >>>>>> -mjk
> >>>>>>
> >>>>>>On Jan 21, 2005, at 2:13 AM, Huiqun Zhou wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>Hi,
> >>>>>>>
> >>>>>>>Do we have more convenient method for installing compute 
> >>>>>>>              
> >>>>>>>
> >>nodes with
> >>    
> >>
> >>>>>>>two NICs? Currently I have to
> >>>>>>>install compute nodes twice to let the second NIC configured.
> >>>>>>>add-extra-nic cannot add infomation of the
> >>>>>>>second nic into database because there are no records 
> >>>>>>>              
> >>>>>>>
> >>for the compute
> >>    
> >>
> >>>>>>>nodes right after the frontend was
> >>>>>>>installed. Although I have only 8 compute nodes, 
> >>>>>>>              
> >>>>>>>
> >>installation took me
> >>    
> >>
> >>>>>>>more an hour.
> >>>>>>>
> >>>>>>>I don't know if I miss some information in the manual, 
> >>>>>>>              
> >>>>>>>
> >>please give me
> >>    
> >>
> >>>>>>>an idea.
> >>>>>>>
> >>>>>>>Regards,
> >>>>>>>
> >>>>>>>
> >>>>>>>Huiqun Zhou
> >>>>>>> 
> >>>>>>>
> >>>>>>>     
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>   
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>> 
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>> 
> >>>
> >>>      
> >>>
> >>    
> >>
> >
> >  
> >
> 
> 



More information about the npaci-rocks-discussion mailing list