[mpiwg-sessions] MPICH/hydra happier with Dan's test cases (kind of)
Pritchard Jr., Howard
howardp at lanl.gov
Sun Sep 16 23:39:23 CDT 2018
Hi Folks,
I made some minor corrections to the two problem test cases and they now work
with mpich. I opened a PR and assigned Dan as the reviewer.
Howard
--
Howard Pritchard
B Schedule
HPC-ENV
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory
From: mpiwg-sessions <mpiwg-sessions-bounces at lists.mpi-forum.org<mailto:mpiwg-sessions-bounces at lists.mpi-forum.org>> on behalf of MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Reply-To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Date: Sunday, September 16, 2018 at 10:10 PM
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Cc: Howard Pritchard <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Subject: [mpiwg-sessions] MPICH/hydra happier with Dan's test cases (kind of)
HI Folks,
MPICH/hydra is happy with libCFG_noMCW, even with all the sleeps
removed. I ran up to 36 ranks with hydra/mpich and didn’t see a problem.
Its not so happy with the other tests, I think they are buggy.
Here’s what I get for libCFG_noMCW_multiport
hpp at sn-fey1:/usr/projects/hpctools/hpp/mpi_sessions_code_sandbox>mpiexec -n 8 ./libCFG_noMCW_multiport
process 1 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD
process 3 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD
process 4 (MPI_COMM_WORLD) calling from group thingy
rank 4 non-trivial use-case: target group size 4, localGroup size 1
rank 4 opening port
process 5 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD
process 6 (MPI_COMM_WORLD) calling from group thingy
rank 6 non-trivial use-case: target group size 4, localGroup size 1
rank 6 opening port
process 7 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD
process 0 (MPI_COMM_WORLD) calling from group thingy
rank 0 non-trivial use-case: target group size 4, localGroup size 1
process 2 (MPI_COMM_WORLD) calling from group thingy
rank 2 non-trivial use-case: target group size 4, localGroup size 1
rank 2 opening port
rank 2 opened port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$
rank 2 publishing port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$ using name foobar10 round 1
rank 4 opened port tag#0$description#sn-fey1.lanl.gov$port#45508$ifname#128.165.227.181$
rank 4 publishing port tag#0$description#sn-fey1.lanl.gov$port#45508$ifname#128.165.227.181$ using name foobar10 round 2
rank 6 opened port tag#0$description#sn-fey1.lanl.gov$port#52388$ifname#128.165.227.181$
rank 6 publishing port tag#0$description#sn-fey1.lanl.gov$port#52388$ifname#128.165.227.181$ using name foobar10 round 3
rank 2 published port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$ using name foobar10 round 1
rank 2 accepting on port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$ (localSize 1)
Fatal error in PMPI_Publish_name: Invalid service name (see MPI_Publish_name), error stack:
PMPI_Publish_name(134): MPI_Publish_name(service="foobar10 round 2", MPI_INFO_NULL, port="tag#0$description#sn-fey1.lanl.gov$port#45508$ifname#128.165.227.181$") failed
MPID_NS_Publish(67)...: Lookup failed for service name foobar10 round 2
Fatal error in PMPI_Publish_name: Invalid service name (see MPI_Publish_name), error stack:
PMPI_Publish_name(134): MPI_Publish_name(service="foobar10 round 3", MPI_INFO_NULL, port="tag#0$description#sn-fey1.lanl.gov$port#52388$ifname#128.165.227.181$") failed
MPID_NS_Publish(67)...: Lookup failed for service name foobar10 round 3
If I have a chance I”ll play with this test and see if I can get it to work. hmm.. maybe hydra doesn’t like the whitespace publish names.
Howard
--
Howard Pritchard
B Schedule
HPC-ENV
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20180917/ae9a3c8a/attachment-0001.html>
More information about the mpiwg-sessions
mailing list