[Mpi-forum] MPI - write to same binary file. Each process shows huge difference in time intervals to complete

Jeff Hammond jeff.science at gmail.com
Tue Jul 10 13:21:42 CDT 2018


Hi Catherine,

It sounds like this is an implementation issue.  This email list is for
discussion of the MPI standard itself.  While many of us have a great deal
of implementation expertise, you are likely to get the best response from
the user list associated with the implementation you are using:

MPICH <discuss at mpich.org>
Open-MPI <users at lists.open-mpi.org>
MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>

If you are using Cray MPI, you'll need to contact Cray support, perhaps via
the staff that support your Cray machine locally.  For Intel MPI, start
with
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/.
I don't know about SGI or NEC support, unfortunately.  You may also have
good luck with StackOverflow - there are quite a few MPI experts there.

I'll note that most of the implementations of MPI I/O are based on ROMIO,
which is part of MPICH, so you might want to start with the MPICH user list.

Best,

Jeff


On Tue, Jul 10, 2018 at 11:01 AM, Catherine Jenifer Rajam Rajendran <
catrajen at iu.edu> wrote:

> Hi All,
>
> I am trying to write in the same binary file using MPI. I set the offset
> for each process in the beginning as per the rank. Then the following code
> snippet in C runs. All MPI process executes and computes the value and it
> writes to the exact offset as set.
>
> The problem I am facing is, say, out of 32 Process, one process is
> executed in 2 hours. Rest of the process keeps running for more than 24
> hours, The thing is, it computes the values as expected but it takes so
> much time. It seems like a deadlock situation, each process waits for some
> resource. But, I am not sharing/communicating between the processes. I am
> just using MPI_File_write_at to write at a specific location in the binary
> file.
>
> I need to mention that each process computes huge amount of data so
> storing it temporarily seemed inappropriate. I want to write the output in
> single file as number of processes is increased depending on input data.
> Number of computations are evenly distributed to all process. So, why does
> process takes different time interval to finish its job?!
>
> for(i=1;i<=limit;i++)
> {
>     for(j=i+1;j<=limit;j++)
>     {
>         if(my_rank == step%num_cpus)
>         {
>             Calc = Calculation();
>             buf[0] = (double)Calc;
>             MPI_File_write_at(outFile, OUT_ofst, buf, 1, MPI_DOUBLE,
> &status);
>             Calc = 0.0;
>             OUT_ofst += num_cpus*MPI_File_write_at(sizeof(double));
>             count++;
>         }
>         step++;
>     }
> }
>
> I am new to MPI and I guess people must have had similar issues while
> executing in MPI. Can anyone help me out please! I can provide more details
> if needed.
>
> Thanks,
> Catherine
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
>
>


-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20180710/65fa9349/attachment.html>


More information about the mpi-forum mailing list