[Mpi-forum] MPI program on Cluster: Errors

Arsalan Shahid arsalan.shahid at hitecuni.edu.pk
Wed Oct 7 07:10:50 CDT 2015


Hello,

We have started to run MPI programs on cluster. Currently, we have been
facing some issues while running programs on cluster. As a start, we have
setup Beowulf cluster on two PC's. Simple "hello world" program runs
perfectly fine on cluster on both Master and slave nodes. But, while
running MPI programs using communicators and message passing routines, we
are getting errors as below. Moreover, if we run the same program with
number of cores equal to 1-4, the code works fine.  I wonder if someone can
sort out this issue ?

xenox at master:/nfs$ mpirun -np 8 -f machinefile ./MPMPI 4
Fatal error in PMPI_Barrier: A process has failed, error stack:
PMPI_Barrier(428).........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)....: Failure during collective
MPIR_Barrier_impl(328)....:
MPIR_Barrier(292).........:
MPIR_Barrier_intra(149)...:
barrier_smp_intra(94).....:
MPIR_Barrier_impl(335)....: Failure during collective
MPIR_Barrier_impl(328)....:
MPIR_Barrier(292).........:
MPIR_Barrier_intra(169)...:
dequeue_and_set_error(865): Communication error with rank 4
barrier_smp_intra(109)....:
MPIR_Bcast_impl(1458).....:
MPIR_Bcast(1482)..........:
MPIR_Bcast_intra(1291)....:
MPIR_Bcast_binomial(309)..: Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..:
MPIR_Barrier(292).......:
MPIR_Barrier_intra(149).:
barrier_smp_intra(109)..:
MPIR_Bcast_impl(1458)...:
MPIR_Bcast(1482)........:
MPIR_Bcast_intra(1291)..:
MPIR_Bcast_binomial(309): Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..:
MPIR_Barrier(292).......:
MPIR_Barrier_intra(149).:
barrier_smp_intra(109)..:
MPIR_Bcast_impl(1458)...:
MPIR_Bcast(1482)........:
MPIR_Bcast_intra(1291)..:
MPIR_Bcast_binomial(309): Failure during collective
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428).......: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(335)..: Failure during collective
MPIR_Barrier_impl(328)..:
MPIR_Barrier(292).......:
MPIR_Barrier_intra(149).:
barrier_smp_intra(109)..:
MPIR_Bcast_impl(1458)...:
MPIR_Bcast(1482)........:
MPIR_Bcast_intra(1291)..:
MPIR_Bcast_binomial(309): Failure during collective

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 15777 RUNNING AT master
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1 at slave1] HYD_pmcd_pmip_control_cmd_cb
(pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1 at slave1] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at slave1] main (pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at master] HYDT_bscu_wait_for_completion
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes
terminated badly; aborting
[mpiexec at master] HYDT_bsci_wait_for_completion
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
[mpiexec at master] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
completion

[mpiexec at master] main (ui/mpich/mpiexec.c:336): process manager error
waiting for completion

​-Regards​
-- 
*​-Arsalan Shahid*

*​ RA, ACAL Research centre ​ *
* Department of Electrical Engineering*
* HITEC University-*
*Taxila Cant​t​-Islamabad-Pakistan*

*​ *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20151007/644e9b74/attachment.html>


More information about the mpi-forum mailing list