[Mpi-comments] non-collective version of MPI_File_set_view

Fri Feb 25 15:04:48 CST 2022

I would like to suggest a non-collective version of MPI_File_set_view.
The current MPI_File_set_view is defines as the following.
    MPI_FILE_SET_VIEW(fh, disp, etype, filetype, datarep, info)

The last two arguments, datarep and info, may be necessary to be
set collectively, but disp, etype, and filetype are only relevant
to the calling process.

It is not rare to see in today's MPI-IO programs the interleaved
calls to collective and independent file read/write functions.
For example, an application writes to multiple variables of different 
sizes. Large variables are partitioned among processes while small
ones are read/written by rank 0 only. Accessing partitioned variables
often uses filetype, such as MPI_Type_create_subarray, to set the view.  
Having only the collective version of MPI_File_set_view introduces a
penalty for processes not accessing small variables in this scenario.
To deal with this issue, PnetCDF opens the file twice, one with
MPI_COMM_WORLD and the other with MPI_COMM_SELF, so the latter can
be used to do independent I/O.

A non-collective version of MPI_File_set_view can look similar, but
without the last two arguments, datarep and info.

An alternative is to create a new set of MPI I/O functions that add
the three arguments disp, etype, and filetype. For example, a new
write function corresponds to
    MPI_FILE_WRITE(fh, buf, count, datatype, status)
will be
    MPI_FILE_WRITE_x(fh, buf, count, datatype, disp, etype, filetype, status)

Wei-keng Liao