<div dir="ltr"><div class="gmail_default" style="font-family:'courier new',monospace;font-size:small;color:rgb(0,0,0)">Hello, </div><div class="gmail_default" style="font-family:'courier new',monospace;font-size:small;color:rgb(0,0,0)"><br></div><div class="gmail_default" style="font-family:'courier new',monospace;font-size:small;color:rgb(0,0,0)">We have started to run MPI programs on cluster. Currently, we have been facing some issues while running programs on cluster. As a start, we have setup Beowulf cluster on two PC's. Simple "hello world" program runs perfectly fine on cluster on both Master and slave nodes. But, while running MPI programs using communicators and message passing routines, we are getting errors as below. Moreover, if we run the same program with number of cores equal to 1-4, the code works fine. I wonder if someone can sort out this issue ? </div><pre class="">xenox@master:/nfs$ mpirun -np 8 -f machinefile ./MPMPI 4
Fatal error in PMPI_Barrier: A process has failed, error stack:
PMPI_Barrier(428).........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)....: Failure during collective
MPIR_Barrier_impl(328)....:
MPIR_Barrier(292).........:
MPIR_Barrier_intra(149)...:
barrier_smp_intra(94).....:
MPIR_Barrier_impl(335)....: Failure during collective
MPIR_Barrier_impl(328)....:
MPIR_Barrier(292).........:
MPIR_Barrier_intra(169)...:
dequeue_and_set_error(865): Communication error with rank 4
barrier_smp_intra(109)....:
MPIR_Bcast_impl(1458).....:
MPIR_Bcast(1482)..........:
MPIR_Bcast_intra(1291)....:
MPIR_Bcast_binomial(309)..: Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..:
MPIR_Barrier(292).......:
MPIR_Barrier_intra(149).:
barrier_smp_intra(109)..:
MPIR_Bcast_impl(1458)...:
MPIR_Bcast(1482)........:
MPIR_Bcast_intra(1291)..:
MPIR_Bcast_binomial(309): Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..:
MPIR_Barrier(292).......:
MPIR_Barrier_intra(149).:
barrier_smp_intra(109)..:
MPIR_Bcast_impl(1458)...:
MPIR_Bcast(1482)........:
MPIR_Bcast_intra(1291)..:
MPIR_Bcast_binomial(309): Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..:
MPIR_Barrier(292).......:
MPIR_Barrier_intra(149).:
barrier_smp_intra(109)..:
MPIR_Bcast_impl(1458)...:
MPIR_Bcast(1482)........:
MPIR_Bcast_intra(1291)..:
MPIR_Bcast_binomial(309): Failure during collective
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 15777 RUNNING AT master
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1@slave1] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@slave1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@slave1] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@master] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@master] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@master] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion </pre><div class="gmail_default" style="font-family:'courier new',monospace;font-size:small;color:rgb(0,0,0)"><span style="font-family:'Courier New',Courier,monospace,arial,sans-serif;font-size:14px;white-space:pre-wrap">[mpiexec@master] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion</span> </div><div><br></div><div><div class="gmail_default" style="font-family:'courier new',monospace;font-size:small;color:rgb(0,0,0)">-Regards</div></div>-- <br><div class="gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div style="color:rgb(0,0,0);font-size:12.7273px"><div style="font-size:12.7273px"><font face="courier new, monospace"><b><span style="font-size:12.7273px"><div style="font-size:small;display:inline"></div>-Arsalan Shahid</span></b></font></div><div style="font-size:12.7273px"><font face="courier new, monospace"><b><span style="font-size:12.7273px"><div style="font-size:small;display:inline"> RA, ACAL Research centre </div> </span><br></b></font></div><div style="font-size:12.7273px"><font face="courier new, monospace"><b><span style="font-size:12.7273px"> Department of Electrical Engineering</span></b></font></div><div style="font-size:12.7273px"><span style="font-size:12.7273px"><font face="courier new, monospace"><b> HITEC University-</b></font></span><b style="font-family:'courier new',monospace;font-size:12.7273px">Taxila Cant<div style="font-size:small;display:inline">t-Islamabad-</div>Pakistan</b></div><div style="color:rgb(34,34,34);font-size:small"><span style="color:rgb(0,0,0);font-size:12.7273px"><font face="courier new, monospace"><b><div style="font-size:small;display:inline"> <br></div></b></font></span></div></div></div></div></div></div></div></div></div>
</div>