[Mpi3-tools] DRAFT of the MPIR Process Acquisition Interface document

Fri May 14 12:51:40 CDT 2010

David Lecomber wrote:
> Hi John,
> 
> Thanks for circulating.  
> 
> I have one query - for the Tool Daemon Launch Extension as described, we
> get a binary executed out on the nodes with arguments..  Could we/should
> we push for more?

Remember, this paper attempts to describe where we are, not where we want to be. It describes current practice.

"Know from whence you came. If you know whence you came, there are absolutely no limitations to where you can go." -- James Baldwin

> My preference - and I think other other tools would benefit - would be
> for having arbitrary files shipped out to compute nodes, as well as
> optionally executing these files.

Sure, that sounds useful, but the MPIR interface as it stands now has no provision for that, and I'm not aware of any vendor extension to MPIR that provides that functionality. So... if that's what is wanted, someone will need to write a separate proposal, implement it, and get it through the standardization proecss.

> This would allow rapid sending of shared libraries, topology files, etc.
> - which could really help scalability in the case where a system vendor
> has specific capability for this in their existing tree network.

You must be thinking of the capability on Cray XT systems.

> In the case of a large but ordinary cluster, for example, it would be
> preferable not to take shared libs and other daemons or config files
> from a shared filesystem, but for the remote nodes to be able to read
> them from their own /tmp or equivalent local storage having had them
> delivered by the MPI's own daemons.

Again, sounds like what Cray XT does. I'm not sure if other system support a similar arrangement.

But one thing to be wary of, at least on Cray, is that "/tmp" on the compute node is a memory disk which transparently steals memory from the user's application. So, the bigger the files in /tmp the less memory available for the application, and users don't like to "share" memory (no pun intended). So, there are limits to how much stuff can be copied to the compute nodes.

> Of course, you could do all this already by appending to your binary
> that file you want to send out with it, but that does feel a bit dirty!

Or you can use a tool like makeself.sh to create a self extracting .run file.

Cheers, John D.

> I'll be on the call next week I hope - but thought I'd just raise the
> question now, and let folks ponder it first.
> 
> Best wishes
> David
>