[MPI3-IO] shared file pointer

Mohamad Chaarawi chaarawi at hdfgroup.org
Fri Feb 24 08:19:58 CST 2012


Hi Adam,


On 2/23/2012 11:52 PM, Moody, Adam T. wrote:
> Hi Mohamad,
> One detail we may still need to worry about in the collective case... If I remember correctly, an app is required to initiate collectives in the same order on all procs, but those collectives may complete in any order.  If we extend this to non-blocking i/o collectives but we don't specify when the pointer is updated, things get confusing.  For example,
>
> MPI_File_iread_ordered(A)
> MPI_File_iread_ordered(B)
> MPI_Waitany(A or B)
> MPI_Waitany(whatever is left between A and B)
>
> It's possible that the first waitany call will complete B, not A.  If an implementation is allowed to update the pointer in the completion call (the waitany), then B will read the field intended for A.

In an all collective case, The pointer is still updated in order of the 
calls made.  The calls CAN complete out of order though, but that is not 
the issue here I guess..

>
> On the other hand, if the standard declares that the pointer must be updated when the operation is initiated (the iread_ordered call), then the correct field is read into its corresponding variable regardless of the order in which the operations are completed.  I think this is a nice property to strive for, but it's been long enough now that I forget what the current proposal text says in this case.

The current text in the draft says that the pointers are advanced in the 
order of the operations called (in the initiation call), so you will get 
what you expect when all the calls are collective.. For the mix case 
(with shared file pointers), it becomes more complicated, as the 
non-collective call now depends on the progress of the other processes 
to be able to update its pointer and make progress, which is why we 
chose to leave the end result of this case as undefined (the Advice to 
Users added after the iread/iwrite_ordered operations). We do need to 
make up our mind about this, whether to keep this case as undefined or 
not, before Monday, which is the deadline for changing the draft.


Thanks,
Mohamad

> -Adam
>
> ________________________________________
> From: mpi3-io-bounces at lists.mpi-forum.org [mpi3-io-bounces at lists.mpi-forum.org] On Behalf Of Mohamad Chaarawi [chaarawi at hdfgroup.org]
> Sent: Thursday, February 23, 2012 1:48 PM
> To: mpi3-io at lists.mpi-forum.org
> Subject: Re: [MPI3-IO] shared file pointer
>
> Hi Adam,
>
> On 2/22/2012 7:25 PM, Adam T. Moody wrote:
>> I will say, though, given that we can now have multiple outstanding
>> non-blocking collective I/O calls, it would be really nice to be able
>> to do the following:
>>
>> MPI_File_iread_ordered()
>> MPI_File_iread_ordered()
>> MPI_File_iread_ordered()
>> MPI_Waitall()
>>
>> This provides a natural way for someone to read in three different
>> sections from a file -- just issue all the calls and sit back and
>> wait.  However, this can only be done if the pointer is updated in the
>> initiation call.
> I don't see how the current proposal would prevent you from doing this..
> The only ordering that we say is undefined is when you mix collective
> and independent operations..
>
> Thanks,
> Mohamad
>
>> -Adam
>>
>>
>> Adam T. Moody wrote:
>>
>>> Hi Mohamad and Dries,
>>> Yes, I see your point now about "using the corresponding blocking
>>> collective routines when ... the end call is issued".  I don't think
>>> that's what the standard intended, but you're very right in that the
>>> text says two different things.  Some statements say the pointer is
>>> updated by the call that initiates the operation, i.e., the _begin
>>> call, but this says the opposite in that an implementation is allowed
>>> to do all the work (including updating of the pointer) in the _end
>>> call.  Thus, it's not clear whether the pointer will always be
>>> updated after returning from the _begin call.
>>> -Adam
>>>
>>>
>>> Mohamad Chaarawi wrote:
>>>
>>>
>>>
>>>> Hi Adam,
>>>>
>>>>
>>>>
>>>>
>>>>> This statement says that an app can't know whether the begin call
>>>>> will synchronize or not, so a portable app must assume that the
>>>>> call does synchronize.  However, the earlier statements say that
>>>>> regardless of whether the MPI library implements the begin call as
>>>>> blocking or non-blocking, the app is always guaranteed that the
>>>>> shared file pointer will be updated upon return from the begin call.
>>>>>
>>>>>
>>>> Yes but I agree with Dries that there is a contradiction, and it can
>>>> be interpreted by a developer either way, i.e. the pointer can be
>>>> either updated in the begin or end call..
>>>>
>>>>
>>>>
>>>>
>>>>> With split collectives, the "begin" call that initiates the
>>>>> operation *can* block, but with non-blocking collectives (as
>>>>> currently defined), the "i" call that initiates the operation
>>>>> *never* blocks.  It's this difference between split collectives and
>>>>> non-blocking collectives that causes the difficulty here.  To
>>>>> efficiently meet the requirements of updating the shared file
>>>>> pointer, we'd really like to update the pointer during the "i"
>>>>> call, but this would require the "i" call to block.
>>>>>
>>>>>
>>>> I do not have a strong opinion here, as we don't really use this
>>>> feature.. But I can see how this could complicate things more to the
>>>> user and the developer, which makes me more inclined to keep the
>>>> ordering undefined.
>>>> That said, we do want to start working on a ticket for new MPI-I/O
>>>> features that would actually track order inside the implementation
>>>> for nonblocking file access and manipulation routines (more like
>>>> queuing).. We discussed that at the last Chicago meeting..  This is
>>>> not MPI-3.0 bound though :)
>>>>
>>>> Thanks,
>>>> Mohamad
>>>>
>>>>
>>>> _______________________________________________
>>>> MPI3-IO mailing list
>>>> MPI3-IO at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-io
>>>>
>>>>
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> MPI3-IO mailing list
>> MPI3-IO at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-io
> _______________________________________________
> MPI3-IO mailing list
> MPI3-IO at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-io
>
> _______________________________________________
> MPI3-IO mailing list
> MPI3-IO at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-io




More information about the mpiwg-io mailing list