[Mpi-forum] MPI - write to same binary file. Each process shows huge difference in time intervals to complete

Catherine Jenifer Rajam Rajendran catrajen at iu.edu
Tue Jul 10 13:25:34 CDT 2018


Hi Jeff,

Thank you so much for the update!
I already posted in Stack Overflow - no help yet! I will check with our
Cray machine support staff and I will shoot an email to MPICH too! Really,
thanks for the response!

Best,
Catherine

On Tue, Jul 10, 2018 at 2:21 PM, Jeff Hammond <jeff.science at gmail.com>
wrote:

> Hi Catherine,
>
> It sounds like this is an implementation issue.  This email list is for
> discussion of the MPI standard itself.  While many of us have a great deal
> of implementation expertise, you are likely to get the best response from
> the user list associated with the implementation you are using:
>
> MPICH <discuss at mpich.org>
> Open-MPI <users at lists.open-mpi.org>
> MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>
>
> If you are using Cray MPI, you'll need to contact Cray support, perhaps
> via the staff that support your Cray machine locally.  For Intel MPI, start
> with https://software.intel.com/en-us/forums/intel-
> clusters-and-hpc-technology/.  I don't know about SGI or NEC support,
> unfortunately.  You may also have good luck with StackOverflow - there are
> quite a few MPI experts there.
>
> I'll note that most of the implementations of MPI I/O are based on ROMIO,
> which is part of MPICH, so you might want to start with the MPICH user list.
>
> Best,
>
> Jeff
>
>
> On Tue, Jul 10, 2018 at 11:01 AM, Catherine Jenifer Rajam Rajendran <
> catrajen at iu.edu> wrote:
>
>> Hi All,
>>
>> I am trying to write in the same binary file using MPI. I set the offset
>> for each process in the beginning as per the rank. Then the following code
>> snippet in C runs. All MPI process executes and computes the value and it
>> writes to the exact offset as set.
>>
>> The problem I am facing is, say, out of 32 Process, one process is
>> executed in 2 hours. Rest of the process keeps running for more than 24
>> hours, The thing is, it computes the values as expected but it takes so
>> much time. It seems like a deadlock situation, each process waits for some
>> resource. But, I am not sharing/communicating between the processes. I am
>> just using MPI_File_write_at to write at a specific location in the binary
>> file.
>>
>> I need to mention that each process computes huge amount of data so
>> storing it temporarily seemed inappropriate. I want to write the output in
>> single file as number of processes is increased depending on input data.
>> Number of computations are evenly distributed to all process. So, why does
>> process takes different time interval to finish its job?!
>>
>> for(i=1;i<=limit;i++)
>> {
>>     for(j=i+1;j<=limit;j++)
>>     {
>>         if(my_rank == step%num_cpus)
>>         {
>>             Calc = Calculation();
>>             buf[0] = (double)Calc;
>>             MPI_File_write_at(outFile, OUT_ofst, buf, 1, MPI_DOUBLE,
>> &status);
>>             Calc = 0.0;
>>             OUT_ofst += num_cpus*MPI_File_write_at(sizeof(double));
>>             count++;
>>         }
>>         step++;
>>     }
>> }
>>
>> I am new to MPI and I guess people must have had similar issues while
>> executing in MPI. Can anyone help me out please! I can provide more details
>> if needed.
>>
>> Thanks,
>> Catherine
>>
>> _______________________________________________
>> mpi-forum mailing list
>> mpi-forum at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
>>
>>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20180710/d9cadbf4/attachment-0001.html>


More information about the mpi-forum mailing list