[MPI3-IO] shared file pointer
Adam T. Moody
moody20 at llnl.gov
Thu Feb 16 16:08:43 CST 2012
Dries Kimpe wrote:
>* Adam T. Moody <moody20 at llnl.gov> [2012-02-16 12:43:35]:
>
>
>
>>Hi guys,
>>My interpretation of the standard is that the following use of split
>>collectives is ok:
>>
>>
>
>
>
>>MPI_File_read_ordered_begin()
>>MPI_File_read_shared()
>>MPI_File_read_ordered_end()
>>
>>
>
>
>
>>This understanding comes from the following:
>>
>>
>
>
>
>>p405, 17-9
>>"In general, each I/O operation leaves the file pointer pointing to the
>>next data itme after the last one that is accessed by the operation. In
>>a nonblocking or split collective operation, the pointer is updated by
>>the call that initiates the I/O, possibly before the access completes."
>>
>>
>
>
>
>>and:
>>
>>
>
>
>
>>p417, 1-2
>>"After a shared file pointer operation is initiated, the shared file
>>pointer is updated to point to the next etype after the last one that
>>will be accessed."
>>
>>
>
>Contrast that with (page 422 in MPI-22 document):
>
>An implementation is free to implement any split collective data access
>routine using the corresponding blocking collective routine when either
>the begin call (e.g., MPI_FILE_READ_ALL_BEGIN) or the end call (e.g.,
>MPI_FILE_READ_ALL_END) is issued. The begin and end calls are provided to
>allow the user and MPI implementation to optimize the collective
>operation.
>
>Note that I never claimed that your example was not allowed; I simply
>pointed out that, according to the text above, the end result is not
>deterministic.
>
>[ text remove ]
>
>
Hi Dries,
This statement says that an app can't know whether the begin call will
synchronize or not, so a portable app must assume that the call does
synchronize. However, the earlier statements say that regardless of
whether the MPI library implements the begin call as blocking or
non-blocking, the app is always guaranteed that the shared file pointer
will be updated upon return from the begin call.
With split collectives, the "begin" call that initiates the operation
*can* block, but with non-blocking collectives (as currently defined),
the "i" call that initiates the operation *never* blocks. It's this
difference between split collectives and non-blocking collectives that
causes the difficulty here. To efficiently meet the requirements of
updating the shared file pointer, we'd really like to update the pointer
during the "i" call, but this would require the "i" call to block.
-Adam
>
>
>>However, to do this, the read_shared() call would need to block until
>>all procs have called iread_ordered(). This does pose two difficulties:
>> 1) It requires MPI to track any outstanding non-blocking collective
>>calls so that it knows to wait on them.
>>
>>
>
>That is pretty easy. An attribute on the file handle can be used for that.
>
>
>
>> 2) It forces read_shared() into this limbo between being local / not
>>local. It's supposed to be a local call, but in this context, it can't
>>return until all other procs have called the collectives that come
>>before it. So is it still "local"?
>>
>>
>
>This is pretty major if you ask me.
>
>Especially since an easy solution to avoid this problem is opening the
>file twice (so you effectively get two shared file pointers).
>
>I'm wondering if there are any other functions already in this position?
>The best I can come up with is MPI_Recv (and friends) and MPI_Send,
>since technically those can block until the other side posts a matching
>operation, but that is easily motivated in the standard.
>
>The text you quoted is interesting though, because at least it makes it
>very clear that the shared file pointer is updated first, possibly before
>the data is accessed.
>
>This clarifies what happens in a fault situation (read error or
>otherwise): the shared file pointer is moved to point to the first byte
>after the region that is *requested*, not after the region that is
>actually successfully accessed.
>
>So, if the call fails to ready any data, the shared file pointer can still
>point to the position it would have if the call succeeded.
>
> Dries
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>MPI3-IO mailing list
>MPI3-IO at lists.mpi-forum.org
>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-io
>
>
More information about the mpiwg-io
mailing list