[Mpi3-tools] Next MPI-3 Tools WG meeting Monday 8/26 + Schedule fornext meetings
John.DelSignore at roguewave.com
Mon Aug 9 13:19:30 CDT 2010
Dong H. Ahn wrote:
> Hi John,
> I just looked at the revision and I think this addressed most of the
> issues I've raised except for the MPIR_attach_fifo support.
Sorry, I forgot about MPIR_attach_fifo. I had to scrounge around in my email to find the description. Here's what I have, from an email you sent on 7/9/10:
... I would also hope to
cover a recently implemented MPIR extension on OpenRTE/OpenMPI:
Definition is not required.
Definition is contained within the address space of the starter process.
Variable is written by the tool, and read by the starter process.
MPIR_attach_fifo is a null-terminated character string that is written by the tool into the
address space of the starter process. The string is the path name of a FIFO (named pipe). Writing a byte
into this FIFO will cause the starter process to start to monitor MPIR_being_debugged in attach mode.
As part of LaunchMON/STAT OpenRTE/OpenMPI port, Ralph Castain reproduced
BlueGene's MPIR colocation extension but there was a performance
requirement of orterun that kept orterun from polling on the value
changes of MPIR_being_debugged for attach case by default. So we
addressed it by making orterun open up a FIFO and its comm thread block
on the FIFO along with other TCP channels; when a debugger attaches to
orterun and writes a byte into it, the starter process looks at
MPIR_being_debugger and performs the co-location service.
> If it is
> missing from this revision because of a lack of technical details, Ralph
> and I should be able to provide them.
Given that I know zilch MPIR_attach_fifo, it would make sense for someone else (not me) to write-up the description and incorporate it into the document. Since it is an extension, I think it should be added in sections 7 and 9. Sections 7.3 and 9.19 seem appropriate places for the additions.
Finally, I'd like a little clarification on this extension...
1) Who creates and owns the FIFO? Is it the starter process? On what node is the FIFO created? My concern is this: If the starter process is being debugged remotely (e.g.: totalview -r remotehost mpiexec) does the debugger have to open the FIFO on the remotehost? I'd assume so, which means that the debugger has to extend its remote debugging protocol to execute this "write to FIFO" operation.
2) Who is responsible for deleting the FIFO? What happens if the debugger forcibly kills the starter process? Is the FIFO leaked?
3) Why didn't you use a socket or a signal instead of a FIFO? With a socket or signal, you don't have lifetime issues. With a signal, you don't have remote debugging issues.
Cheers, John D.
> On 8/6/2010 11:58 AM, John DelSignore wrote:
>> Attached is the most recent draft (8/6/2010) of the MPIR document for discussion during the 8/9 meeting.
>> I updated it to reflect the comments I received in email on the 6/11/2010 draft that I felt warranted changing the document.
>> I also copy and pasted Jeff's mpimsgq_dll_locations description from the Latex file and the MPI website; this is intended as a placeholder until someone (not me) has time to cleanup the text to fit into the document.
>> The "Diffs" document shows the changes relative to the 6/11/2010 draft.
>> Cheers, John D.
>> Jeff Squyres wrote:
>>> On Jul 25, 2010, at 5:23 PM, Martin Schulz wrote:
>>>> Here is the proposed schedule for the upcoming meetings after tomorrow:
>>>> 8/9 - Final discussion of the MPIR document
>>>> 8/23 - Discussion of the completed/integrated MPIT document
>>>> 9/6 - Labor day - no meeting
>>>> 9/16-9/18 - MPI forum meeting in Stuttgart
>>>> First reading of the MPIR document
>>>> 9/20 - Feedback from the MPI forum, MPIT discussions
>>> Webex links for these meetings are now up on the wiki.
> Mpi3-tools mailing list
> Mpi3-tools at lists.mpi-forum.org
More information about the mpiwg-tools