[mpiwg-sessions] MPICH/hydra happier with Dan's test cases (kind of)

Pritchard Jr., Howard howardp at lanl.gov
Sun Sep 16 23:39:23 CDT 2018


Hi Folks,

I made some minor corrections to the two problem test cases and they now work
with mpich.  I opened a PR and assigned Dan as the reviewer.

Howard

--
Howard Pritchard
B Schedule
HPC-ENV
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory


From: mpiwg-sessions <mpiwg-sessions-bounces at lists.mpi-forum.org<mailto:mpiwg-sessions-bounces at lists.mpi-forum.org>> on behalf of MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Reply-To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Date: Sunday, September 16, 2018 at 10:10 PM
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Cc: Howard Pritchard <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Subject: [mpiwg-sessions] MPICH/hydra happier with Dan's test cases (kind of)

HI Folks,

MPICH/hydra is happy with libCFG_noMCW, even with all the sleeps
removed. I ran up to 36 ranks with hydra/mpich and didn’t see a problem.

Its not so happy with the other tests, I think they are buggy.

Here’s what I get for libCFG_noMCW_multiport


hpp at sn-fey1:/usr/projects/hpctools/hpp/mpi_sessions_code_sandbox>mpiexec -n 8 ./libCFG_noMCW_multiport

process 1 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD

process 3 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD

process 4 (MPI_COMM_WORLD) calling from group thingy

rank 4 non-trivial use-case: target group size 4, localGroup size 1

rank 4 opening port

process 5 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD

process 6 (MPI_COMM_WORLD) calling from group thingy

rank 6 non-trivial use-case: target group size 4, localGroup size 1

rank 6 opening port

process 7 (MPI_COMM_WORLD) now calling barrier on MPI_COMM_WORLD

process 0 (MPI_COMM_WORLD) calling from group thingy

rank 0 non-trivial use-case: target group size 4, localGroup size 1

process 2 (MPI_COMM_WORLD) calling from group thingy

rank 2 non-trivial use-case: target group size 4, localGroup size 1

rank 2 opening port

rank 2 opened port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$

rank 2 publishing port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$ using name foobar10 round 1

rank 4 opened port tag#0$description#sn-fey1.lanl.gov$port#45508$ifname#128.165.227.181$

rank 4 publishing port tag#0$description#sn-fey1.lanl.gov$port#45508$ifname#128.165.227.181$ using name foobar10 round 2

rank 6 opened port tag#0$description#sn-fey1.lanl.gov$port#52388$ifname#128.165.227.181$

rank 6 publishing port tag#0$description#sn-fey1.lanl.gov$port#52388$ifname#128.165.227.181$ using name foobar10 round 3

rank 2 published port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$ using name foobar10 round 1

rank 2 accepting on port tag#0$description#sn-fey1.lanl.gov$port#55566$ifname#128.165.227.181$ (localSize 1)

Fatal error in PMPI_Publish_name: Invalid service name (see MPI_Publish_name), error stack:

PMPI_Publish_name(134): MPI_Publish_name(service="foobar10 round 2", MPI_INFO_NULL, port="tag#0$description#sn-fey1.lanl.gov$port#45508$ifname#128.165.227.181$") failed

MPID_NS_Publish(67)...: Lookup failed for service name foobar10 round 2

Fatal error in PMPI_Publish_name: Invalid service name (see MPI_Publish_name), error stack:

PMPI_Publish_name(134): MPI_Publish_name(service="foobar10 round 3", MPI_INFO_NULL, port="tag#0$description#sn-fey1.lanl.gov$port#52388$ifname#128.165.227.181$") failed

MPID_NS_Publish(67)...: Lookup failed for service name foobar10 round 3


If I have a chance I”ll play with this test and see if I can get it to work.  hmm.. maybe hydra doesn’t like the whitespace publish names.

Howard


--
Howard Pritchard
B Schedule
HPC-ENV
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20180917/ae9a3c8a/attachment-0001.html>


More information about the mpiwg-sessions mailing list