[Mpi-forum] The MPI Internal error running on Hopper

Jeff Hammond jeff.science at gmail.com
Sat Jul 30 10:17:09 CDT 2011


Report to NERSC support. This is not the appropriate email list for
support of MPI implementations.

CrayMPI is an MPICH2-based implementation so you can also try
mpich-discuss at mcs.anl.gov but it is still preferred to contact NERSC
first since they are the ones who own the Cray support contract for
Hopper.

Jeff

Sent from my iPhone

On Jul 30, 2011, at 9:54 AM, "Xuefei (Rebecca) Yuan" <xyuan at lbl.gov> wrote:

> Hello, all,
>
> I got some MPI internal error while running on a Cray XE6 machine (Hopper), the error message reads:
>
>
> Rank 9 [Sat Jul 30 07:39:14 2011] [c5-2c2s1n3] Fatal error in PMPI_Wait: Other MPI error, error stack:
> PMPI_Wait(179).....................: MPI_Wait(request=0x7fffffff7438, status0x7fffffff7460) failed
> MPIR_Wait_impl(69).................:
> MPIDI_CH3I_Progress(370)...........:
> MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for an unexpected message. 0 unexpected messages queued.
> Rank 63 [Sat Jul 30 07:39:14 2011] [c0-2c2s3n0] Fatal error in MPI_Irecv: Other MPI error, error stack:
> MPI_Irecv(147): MPI_Irecv(buf=0x4a81890, count=52, MPI_DOUBLE, src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, comm=0x84000007, request=0x7fffffff7438) failed
> MPID_Irecv(53): failure occurred while allocating memory for a request object
> Rank 54 [Sat Jul 30 07:39:14 2011] [c1-2c2s3n2] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
> PMPI_Isend(148): MPI_Isend(buf=0x3d12a350, count=52, MPI_DOUBLE, dest=30, tag=21, comm=0xc4000003, request=0x3c9c12f0) failed
> (unknown)(): Internal MPI error!
> Rank 45 [Sat Jul 30 07:39:14 2011] [c1-2c2s2n3] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
> PMPI_Isend(148): MPI_Isend(buf=0x3c638de0, count=34, MPI_DOUBLE, dest=61, tag=21, comm=0x84000007, request=0x3c03be90) failed
> (unknown)(): Internal MPI error!
> Rank 36 [Sat Jul 30 07:39:14 2011] [c3-2c2s2n1] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
> PMPI_Isend(148): MPI_Isend(buf=0x3caaf170, count=52, MPI_DOUBLE, dest=28, tag=21, comm=0xc4000003, request=0x3c2e561c) failed
> (unknown)(): Internal MPI error!
> _pmii_daemon(SIGCHLD): [NID 00102] [c0-2c2s3n0] [Sat Jul 30 07:39:14 2011] PE 63 exit signal Aborted
> _pmii_daemon(SIGCHLD): [NID 06043] [c3-2c2s2n1] [Sat Jul 30 07:39:14 2011] PE 36 exit signal Aborted
> _pmii_daemon(SIGCHLD): [NID 06328] [c1-2c2s3n2] [Sat Jul 30 07:39:14 2011] PE 54 exit signal Aborted
> _pmii_daemon(SIGCHLD): [NID 05565] [c5-2c2s1n3] [Sat Jul 30 07:39:14 2011] PE 9 exit signal Aborted
> _pmii_daemon(SIGCHLD): [NID 06331] [c1-2c2s2n3] [Sat Jul 30 07:39:14 2011] PE 45 exit signal Aborted
> [NID 00102] 2011-07-30 07:39:38 Apid 2986821: initiated application termination
>
> So I checked up the environment parameters on hopper at
>
> https://www.nersc.gov/users/computational-systems/hopper/running-jobs/runtime-tuning-options/#toc-anchor-1
>
> I tried to increase MPI_GNI_MAX_EAGER_MSG_SIZE from 8192 to 131070, but it did not help.
>
> Any suggestions that how could resolve this error for MPI_Irecv() and MPI_Isend()?
>
> Thanks very much!
>
>
> Xuefei (Rebecca) Yuan
> Postdoctoral Fellow
> Lawrence Berkeley National Laboratory
> Tel: 1-510-486-7031
>
>
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum



More information about the mpi-forum mailing list