[MPI3-IO] shared file pointer

Dries Kimpe dkimpe at mcs.anl.gov
Thu Feb 16 15:38:01 CST 2012


* Adam T. Moody <moody20 at llnl.gov> [2012-02-16 12:43:35]:

> Hi guys,
> My interpretation of the standard is that the following use of split 
> collectives is ok:

> MPI_File_read_ordered_begin()
> MPI_File_read_shared()
> MPI_File_read_ordered_end()

> This understanding comes from the following:

> p405, 17-9
> "In general, each I/O operation leaves the file pointer pointing to the 
> next data itme after the last one that is accessed by the operation.  In 
> a nonblocking or split collective operation, the pointer is updated by 
> the call that initiates the I/O, possibly before the access completes."

> and:

> p417, 1-2
> "After a shared file pointer operation is initiated, the shared file 
> pointer is updated to point to the next etype after the last one that 
> will be accessed."

Contrast that with (page 422 in MPI-22 document):

An implementation is free to implement any split collective data access
routine using the corresponding blocking collective routine when either
the begin call (e.g., MPI_FILE_READ_ALL_BEGIN) or the end call (e.g.,
MPI_FILE_READ_ALL_END) is issued. The begin and end calls are provided to
allow the user and MPI implementation to optimize the collective
operation.

Note that I never claimed that your example was not allowed; I simply
pointed out that, according to the text above, the end result is not
deterministic.

[ text remove ]

> However, to do this, the read_shared() call would need to block until 
> all procs have called iread_ordered().  This does pose two difficulties:
>     1) It requires MPI to track any outstanding non-blocking collective 
> calls so that it knows to wait on them.

That is pretty easy. An attribute on the file handle can be used for that.

>     2) It forces read_shared() into this limbo between being local / not 
> local.  It's supposed to be a local call, but in this context, it can't 
> return until all other procs have called the collectives that come 
> before it.  So is it still "local"?

This is pretty major if you ask me.

Especially since an easy solution to avoid this problem is opening the
file twice (so you effectively get two shared file pointers).

I'm wondering if there are any other functions already in this position?
The best I can come up with is MPI_Recv (and friends) and MPI_Send,
since technically those can block until the other side posts a matching
operation, but that is easily motivated in the standard.

The text you quoted is interesting though, because at least it makes it
very clear that the shared file pointer is updated first, possibly before
the data is accessed.

This clarifies what happens in a fault situation (read error or
otherwise): the shared file pointer is moved to point to the first byte
after the region that is *requested*, not after the region that is
actually successfully accessed.

So, if the call fails to ready any data, the shared file pointer can still
point to the position it would have if the call succeeded.

  Dries
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-io/attachments/20120216/67a1e91d/attachment-0001.pgp>


More information about the mpiwg-io mailing list