[Mpi-forum] The MPI Internal error running on Hopper
Xuefei (Rebecca) Yuan
xyuan at lbl.gov
Sat Jul 30 09:54:22 CDT 2011
Hello, all,
I got some MPI internal error while running on a Cray XE6 machine (Hopper), the error message reads:
Rank 9 [Sat Jul 30 07:39:14 2011] [c5-2c2s1n3] Fatal error in PMPI_Wait: Other MPI error, error stack:
PMPI_Wait(179).....................: MPI_Wait(request=0x7fffffff7438, status0x7fffffff7460) failed
MPIR_Wait_impl(69).................:
MPIDI_CH3I_Progress(370)...........:
MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for an unexpected message. 0 unexpected messages queued.
Rank 63 [Sat Jul 30 07:39:14 2011] [c0-2c2s3n0] Fatal error in MPI_Irecv: Other MPI error, error stack:
MPI_Irecv(147): MPI_Irecv(buf=0x4a81890, count=52, MPI_DOUBLE, src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, comm=0x84000007, request=0x7fffffff7438) failed
MPID_Irecv(53): failure occurred while allocating memory for a request object
Rank 54 [Sat Jul 30 07:39:14 2011] [c1-2c2s3n2] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
PMPI_Isend(148): MPI_Isend(buf=0x3d12a350, count=52, MPI_DOUBLE, dest=30, tag=21, comm=0xc4000003, request=0x3c9c12f0) failed
(unknown)(): Internal MPI error!
Rank 45 [Sat Jul 30 07:39:14 2011] [c1-2c2s2n3] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
PMPI_Isend(148): MPI_Isend(buf=0x3c638de0, count=34, MPI_DOUBLE, dest=61, tag=21, comm=0x84000007, request=0x3c03be90) failed
(unknown)(): Internal MPI error!
Rank 36 [Sat Jul 30 07:39:14 2011] [c3-2c2s2n1] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
PMPI_Isend(148): MPI_Isend(buf=0x3caaf170, count=52, MPI_DOUBLE, dest=28, tag=21, comm=0xc4000003, request=0x3c2e561c) failed
(unknown)(): Internal MPI error!
_pmii_daemon(SIGCHLD): [NID 00102] [c0-2c2s3n0] [Sat Jul 30 07:39:14 2011] PE 63 exit signal Aborted
_pmii_daemon(SIGCHLD): [NID 06043] [c3-2c2s2n1] [Sat Jul 30 07:39:14 2011] PE 36 exit signal Aborted
_pmii_daemon(SIGCHLD): [NID 06328] [c1-2c2s3n2] [Sat Jul 30 07:39:14 2011] PE 54 exit signal Aborted
_pmii_daemon(SIGCHLD): [NID 05565] [c5-2c2s1n3] [Sat Jul 30 07:39:14 2011] PE 9 exit signal Aborted
_pmii_daemon(SIGCHLD): [NID 06331] [c1-2c2s2n3] [Sat Jul 30 07:39:14 2011] PE 45 exit signal Aborted
[NID 00102] 2011-07-30 07:39:38 Apid 2986821: initiated application termination
So I checked up the environment parameters on hopper at
https://www.nersc.gov/users/computational-systems/hopper/running-jobs/runtime-tuning-options/#toc-anchor-1
I tried to increase MPI_GNI_MAX_EAGER_MSG_SIZE from 8192 to 131070, but it did not help.
Any suggestions that how could resolve this error for MPI_Irecv() and MPI_Isend()?
Thanks very much!
Xuefei (Rebecca) Yuan
Postdoctoral Fellow
Lawrence Berkeley National Laboratory
Tel: 1-510-486-7031
More information about the mpi-forum
mailing list