\mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.132 l.1 - p.144 l.44, File 1.3/context.tex, lines 1-722 \chapter{Groups, Contexts, and Communicators} \label{sec:context} \label{chap:context} %Version of 4/27/95 \section{Introduction} This chapter introduces \MPI/ features that support the development of parallel libraries. Parallel libraries are needed to encapsulate the distracting complications inherent in parallel implementations of key algorithms. They help to ensure consistent correctness of such procedures, and provide a ``higher level'' of portability than \MPI/ itself can provide. As such, libraries prevent each programmer from repeating the work of defining consistent data structures, data layouts, and methods that implement key algorithms (such as matrix operations). Since the best libraries come with several variations on parallel systems (different data layouts, different strategies depending on the size of the system or problem, or type of floating point), this too needs to be hidden from the user. We refer the reader to \cite{MPILIB} and \cite{MPIPP} for further information on writing libraries in \MPI/, using the features described in this chapter. \subsection{Features Needed to Support Libraries} The key features needed to support the creation of robust parallel libraries are as follows: \begin{itemize} \item Safe communication space, that guarantees that libraries can communicate as they need to, without conflicting with communication extraneous to the library, \item Group scope for collective operations, that allow libraries to avoid unnecessarily synchronizing uninvolved processes (potentially running unrelated code), \item Abstract process naming to allow libraries to describe their communication in terms suitable to their own data structures and algorithms, \item The ability to ``adorn'' a set of communicating processes with additional user-defined attributes, such as extra collective operations. This mechanism should provide a means for the user or library writer effectively to extend a message-passing notation. \end{itemize} In addition, a unified mechanism or object is needed for conveniently denoting communication context, the group of communicating processes, to house abstract process naming, and to store adornments. \subsection{\MPI/'s Support for Libraries} The corresponding concepts that \MPI/ provides, specifically to support robust libraries, are as follows: \begin{itemize} \item {\bf Contexts} of communication, \item {\bf Groups} of processes, \item {\bf Virtual topologies}, \item {\bf Attribute caching}, \item {\bf Communicators}. \end{itemize} {\bf Communicators} (see \cite{communicator,zipcode1,Skj93b}) encapsulate all of these ideas in order to provide the appropriate scope for all communication operations in \MPI/. Communicators are divided into two kinds: \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % intra-communicators for operations within a single group of processes, and % inter-communicators, for point-to-point communication between two groups of intra-communicators for operations within a single group of processes and inter-communicators for operations between two groups of \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification processes. \paragraph{Caching.} Communicators (see below) provide a ``caching'' mechanism that allows one to associate new attributes with communicators, on a par with \MPI/ built-in features. This can be used by advanced users to adorn communicators further, and by \MPI/ to implement some communicator functions. For example, the virtual-topology functions described in Chapter~\ref{chap:topol} are likely to be supported this way. \paragraph{Groups.} Groups define an ordered collection of processes, each with a rank, and it is this group that defines the low-level names for inter-process communication (ranks are used for sending and receiving). Thus, groups define a scope for process names in point-to-point communication. In addition, groups define the scope of collective operations. Groups may be manipulated separately from communicators in \MPI/, but only communicators can be used in communication operations. \paragraph{Intra-communicators.} The most commonly used means for message passing in \MPI/ is via intra-communicators. Intra-communicators contain an instance of a group, contexts of communication for both point-to-point and collective communication, and the ability to include virtual topology and other attributes. These features work as follows: \begin{itemize} \item {\bf Contexts\/} provide the ability to have separate safe ``universes'' of message passing in \MPI/. A context is akin to an additional tag that differentiates messages. The system manages this differentiation process. The use of separate communication contexts by distinct libraries (or distinct library invocations) insulates communication internal to the library execution from external communication. This allows the invocation of the library even if there are pending communications on ``other'' communicators, and avoids the need to synchronize entry or exit into library code. Pending point-to-point communications are also guaranteed not to interfere with collective communications within a single communicator. \item {\bf Groups} define the participants in the communication (see above) of a communicator. \item A {\bf virtual topology} defines a special mapping of the ranks in a group to and from a topology. Special constructors for communicators are defined in chapter~\ref{chap:topol} to provide this feature. Intra-communicators as described in this chapter do not have topologies. \item {\bf Attributes} define the local information that the user or library has added to a communicator for later reference. \end{itemize} \begin{users} \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % The current practice in many communication libraries is that there is The practice in many communication libraries is that there is \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification a unique, predefined communication universe that includes all processes available when the parallel program is initiated; the processes are assigned consecutive ranks. Participants in a point-to-point communication are identified by their rank; a collective communication (such as broadcast) always involves all processes. This practice can be followed in \MPI/ by using the predefined communicator \mpiarg{MPI\_COMM\_WORLD}. {\em Users who are satisfied with this practice can plug in \mpiarg{MPI\_COMM\_WORLD} wherever a communicator argument is required, and can consequently disregard the rest of this chapter.} \end{users} \paragraph{Inter-communicators.} The discussion has dealt so far with {\bf intra-communication}: communication within a group. \MPI/ also supports {\bf inter-communication}: communication between two non-overlapping groups. When an application is built by composing several parallel modules, it is convenient to allow one module to communicate with another using local ranks for addressing within the second module. This is especially convenient in a client-server computing paradigm, where either client or server are parallel. The support of inter-communication also provides a mechanism for the extension of \MPI/ to a dynamic model where not all processes are preallocated at initialization time. In such a situation, it becomes necessary to support communication across ``universes.'' Inter-communication is supported by objects called {\bf inter-communicators}. These objects bind two groups together with communication contexts shared by both groups. For inter-communicators, these features work as follows: \begin{itemize} \item {\bf Contexts\/} provide the ability to have a separate safe ``universe'' of message passing between the two groups. A send in the local group is always a receive in the remote group, and vice versa. The system manages this differentiation process. The use of separate communication contexts by distinct libraries (or distinct library invocations) insulates communication internal to the library execution from external communication. This allows the invocation of the library even if there are pending communications on ``other'' communicators, and avoids the need to synchronize entry or exit into library \mpiiidotiMergeFromREVIEWbegin{9.a}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 code. % There is no general-purpose % collective communication on inter-communicators, so % contexts are used just to isolate point-to-point communication. \mpiiidotiMergeFromREVIEWendI{9.a}% MPI-2.1 End of review based correction \item A local and remote group specify the recipients and destinations for an inter-com\-mun\-i\-ca\-tor. \item Virtual topology is undefined for an inter-communicator. \item As before, attributes cache defines the local information that the user or library has added to a communicator for later reference. \end{itemize} \MPI/ provides mechanisms for creating and manipulating inter-communicators. They are used for point-to-point \mpiiidotiMergeFromREVIEWbegin{9.b}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 and collective \mpiiidotiMergeFromREVIEWendI{9.b}% MPI-2.1 End of review based correction communication in an related manner to intra-communicators. Users who do not need inter-communication in their applications can safely ignore this extension. \mpiiidotiMergeFromREVIEWbegin{9.b}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % Users who need collective operations via inter-communicators % must layer it on top of \MPI/. Users \mpiiidotiMergeFromREVIEWendI{9.b}% MPI-2.1 End of review based correction who require inter-communication between overlapping groups \mpiiidotiMergeFromREVIEWbegin{9.b}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % must also layer must layer \mpiiidotiMergeFromREVIEWendI{9.b}% MPI-2.1 End of review based correction this capability on top of \MPI/. \section{Basic Concepts} % 22-1-1 In this section, we turn to a more formal definition of the concepts introduced above. \subsection{Groups} A {\bf group} is an ordered set of process identifiers (henceforth processes); processes are implementation-dependent objects. Each process in a group is associated with an integer {\bf rank}. Ranks are contiguous and start from zero. Groups are represented by opaque {\bf group objects}, and hence cannot be directly transferred from one process to another. A group is used within a communicator to describe the participants in a communication ``universe'' and to rank such participants (thus giving them unique names within that ``universe'' of communication). There is a special pre-defined group: \const{MPI\_GROUP\_EMPTY}, which is a group with no members. The predefined constant \const{MPI\_GROUP\_NULL} is the value used for invalid group handles. \begin{users} \const{MPI\_GROUP\_EMPTY}, which is a valid handle to an empty group, should not be confused with \const{MPI\_GROUP\_NULL}, which in turn is an invalid handle. The former may be used as an argument to group operations; the latter, which is returned when a group is freed, in not a valid argument. \end{users} \begin{implementors} A group may be represented by a virtual-to-real process-address-translation table. Each communicator object (see below) would have a pointer to such a table. Simple implementations of \MPI/ will enumerate groups, such as in a table. However, more advanced data structures make sense in order to improve scalability and memory usage with large numbers of processes. Such implementations are possible with \MPI/. \end{implementors} \subsection{Contexts} A {\bf context} is a property of communicators (defined next) that allows partitioning of the communication space. A message sent in one context cannot be received in another context. Furthermore, where permitted, collective operations are independent of pending point-to-point operations. Contexts are not explicit \MPI/ objects; they appear only as part of the realization of communicators (below). \begin{implementors} Distinct communicators in the same process have distinct contexts. A context is essentially a system-managed tag (or tags) needed to make a communicator safe for point-to-point and \MPI/-defined collective communication. Safety means that collective and point-to-point communication within one communicator do not interfere, and that communication over distinct communicators don't interfere. A possible implementation for a context is as a supplemental tag attached to messages on send and matched on receive. Each intra-communicator stores the value of its two tags (one for point-to-point and one for collective communication). Communicator-generating functions use a collective communication to agree on a new group-wide unique context. Analogously, in \mpiiidotiMergeFromREVIEWbegin{9.c}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 inter-communication, % (which is strictly point-to-point communication), \mpiiidotiMergeFromREVIEWendI{9.c}% MPI-2.1 End of review based correction two context tags are stored per communicator, one used by group A to send and group B to receive, and a second used by group B to send and for group A to receive. Since contexts are not explicit objects, other implementations are also possible. \end{implementors} \subsection{Intra-Communicators} Intra-communicators bring together the concepts of group and context. To support \linebreak implementation-specific optimizations, and application topologies (defined in the next chapter, chapter~\ref{chap:topol}), communicators may also ``cache'' additional information (see section~\ref{sec:caching}). \MPI/ communication operations reference communicators to determine the scope and the ``communication universe'' in which a point-to-point or collective operation is to operate. Each communicator contains a group of valid participants; this group always includes the local process. The source and destination of a message is identified by process rank within that group. For collective communication, the intra-communicator specifies the set of processes that participate in the collective operation (and their order, when significant). Thus, the communicator restricts the ``spatial'' scope of communication, and provides machine-independent process addressing through ranks. Intra-communicators are represented by opaque {\bf intra-communicator objects}, and hence cannot be directly transferred from one process to another. \subsection{Predefined Intra-Communicators} \label{sec:predef-comms} An initial intra-communicator \const{MPI\_COMM\_WORLD} of all processes the local process can communicate with after initialization (itself included) is defined once \mpifunc{MPI\_INIT} \mpiiidotiMergeFromREVIEWbegin{9.d}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 or \mpifunc{MPI\_INIT\_THREAD} \mpiiidotiMergeFromREVIEWendI{9.d}% MPI-2.1 End of review based correction has been called. In addition, the communicator \const{MPI\_COMM\_SELF} is provided, which includes only the process itself. The predefined constant \const{MPI\_COMM\_NULL} is the value used for invalid communicator handles. In a static-process-model implementation of \MPI/, all processes that participate in the computation are available after \MPI/ is initialized. For this case, \const{MPI\_COMM\_WORLD} is a communicator of all processes available for the computation; this communicator has the same value in all processes. In an implementation of \MPI/ where processes can dynamically join an \MPI/ execution, it may be the case that a process starts an \MPI/ computation without having access to all other processes. In such situations, \const{MPI\_COMM\_WORLD} is a communicator incorporating all processes with which the joining process can immediately communicate. Therefore, \const{MPI\_COMM\_WORLD} may simultaneously \mpiiidotiMergeFromREVIEWbegin{9.e}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % have different values represent disjoint groups \mpiiidotiMergeFromREVIEWendI{9.e}% MPI-2.1 End of review based correction in different processes. All \MPI/ implementations are required to provide the \const{MPI\_COMM\_WORLD} communicator. It cannot be deallocated during the life of a process. The group corresponding to this communicator does not appear as a pre-defined constant, but it may be accessed using \func{MPI\_COMM\_GROUP} (see below). \MPI/ does not specify the correspondence between the process rank in \const{MPI\_COMM\_WORLD} and its (machine-dependent) absolute address. Neither does \MPI/ specify the function of the host process, if any. Other implementation-dependent, predefined communicators may also be provided. \section{Group Management} %22-0-1 This section describes the manipulation of process groups in \MPI/. These \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % operations are local and their execution do not require interprocess operations are local and their execution does not require interprocess \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification communication. \subsection{Group Accessors} \label{subsec:context-grpacc} \begin{funcdef}{MPI\_GROUP\_SIZE(group, size)} \funcarg{\IN}{group}{ group (handle)} \funcarg{\OUT}{size}{ number of processes in the group (integer) } \end{funcdef} \mpibind{MPI\_Group\_size(MPI\_Group~group, int~*size)} \mpifbind{MPI\_GROUP\_SIZE(GROUP, SIZE, IERROR)\fargs INTEGER GROUP, SIZE, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Get\_size() const}{int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{funcdef}{MPI\_GROUP\_RANK(group, rank)} \funcarg{\IN}{group}{ group (handle)} \funcarg{\OUT}{rank}{ rank of the calling process in group, or \linebreak \const{ MPI\_UNDEFINED} if the process is not a member (integer)} \end{funcdef} \mpibind{MPI\_Group\_rank(MPI\_Group~group, int~*rank)} \mpifbind{MPI\_GROUP\_RANK(GROUP, RANK, IERROR)\fargs INTEGER GROUP, RANK, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Get\_rank() const}{int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{funcdef}{MPI\_GROUP\_TRANSLATE\_RANKS (group1, n, ranks1, group2, ranks2)} \funcarg{\IN}{group1}{ group1 (handle) } \funcarg{\IN}{n}{ number of ranks in \mpiarg{ ranks1} and \mpiarg{ranks2} arrays (integer)} \funcarg{\IN}{ranks1}{ array of zero or more valid ranks in group1 } \funcarg{\IN}{group2}{ group2 (handle)} \funcarg{\OUT}{ranks2}{ array of corresponding ranks in group2, % \const{MPI\_UNDE-} \const{FINED} %% not appropriate for automized index \const{MPI\_UNDEFINED} when no correspondence exists.} \end{funcdef} \mpibind{MPI\_Group\_translate\_ranks (MPI\_Group~group1, int~n, int~*ranks1, MPI\_Group~group2, int~*ranks2)} \mpifbind{MPI\_GROUP\_TRANSLATE\_RANKS(GROUP1, N, RANKS1, GROUP2, RANKS2, IERROR)\fargs INTEGER GROUP1, N, RANKS1(*), GROUP2, RANKS2(*), IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Translate\_ranks (const~MPI::Group\&~group1, int~n, const~int~ranks1[], const~MPI::Group\&~group2, int~ranks2[])}{static void} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 This function is important for determining the relative numbering of the same processes in two different groups. For instance, if one knows the ranks of certain processes in the group of \const{MPI\_COMM\_WORLD}, one might want to know their ranks in a subset of that group. \mpiiidotiMergeFromBALLOTbegin{2}{2}% MPI-2.1 Ballots 1-4 % 3.2.12 MPI\_GROUP\_TRANSLATE\_RANKS and MPI\_PROC\_NULL % \const{MPI\_PROC\_NULL} is a valid rank for input to \mpifunc{MPI\_GROUP\_TRANSLATE\_RANKS}, which returns \constskip{MPI\_PROC\_NULL} as the translated rank. \mpiiidotiMergeFromBALLOTendI{2}{2}% MPI-2.1 Ballots 1-4 \begin{funcdef}{MPI\_GROUP\_COMPARE(group1, group2, result)} \funcarg{\IN}{group1}{ first group (handle)} \funcarg{\IN}{group2}{ second group (handle)} \funcarg{\OUT}{result}{ result (integer)} \end{funcdef} \mpibind{MPI\_Group\_compare(MPI\_Group~group1,MPI\_Group~group2,~int~*result)} \mpifbind{MPI\_GROUP\_COMPARE(GROUP1, GROUP2, RESULT, IERROR)\fargs INTEGER GROUP1, GROUP2, RESULT, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Compare(const~MPI::Group\&~group1, const~MPI::Group\&~group2)}{static int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent\const{MPI\_IDENT} results if the group members and group order is exactly the same in both groups. This happens for instance if \mpiarg{group1} and \mpiarg{group2} are the same handle. \const{MPI\_SIMILAR} results if the group members are the same but the order is different. \const{MPI\_UNEQUAL} results otherwise. \subsection{Group Constructors} \label{subsec:context-grpconst} Group constructors are used to subset and superset existing groups. These constructors construct new groups from existing groups. These are local operations, and distinct groups may be defined on different processes; a process may also define a group that does not include itself. Consistent definitions are required when groups are used as arguments in communicator-building functions. \MPI/ does not provide a mechanism to build a group from scratch, but only from other, previously defined groups. The base group, upon which all other groups are defined, is the group associated with the initial communicator \mpiarg{MPI\_COMM\_WORLD} (accessible through the function \func{MPI\_COMM\_GROUP}). \begin{rationale} In what follows, there is no group duplication function analogous to \mpifunc{MPI\_COMM\_DUP}, defined later in this chapter. There is no need for a group duplicator. A group, once created, can have several references to it by making copies of the handle. The following constructors address the need for subsets and supersets of existing groups. \end{rationale} \begin{implementors} Each group constructor behaves as if it returned a new group object. When this new group is a copy of an existing group, then one can avoid creating such new objects, using a reference-count mechanism. \end{implementors} \begin{funcdef}{MPI\_COMM\_GROUP(comm, group)} \funcarg{\IN}{comm}{ communicator (handle)} \funcarg{\OUT}{group}{ group corresponding to \mpiarg{comm} (handle)} \end{funcdef} \mpibind{MPI\_Comm\_group(MPI\_Comm~comm, MPI\_Group~*group)} \mpifbind{MPI\_COMM\_GROUP(COMM, GROUP, IERROR)\fargs INTEGER COMM, GROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Comm::Get\_group() const}{MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \mpifunc{MPI\_COMM\_GROUP} returns in \mpiarg{group} a handle to the group of \mpiarg{comm}. \begin{funcdef}{MPI\_GROUP\_UNION(group1, group2, newgroup)} \funcarg{\IN}{group1}{ first group (handle)} \funcarg{\IN}{group2}{ second group (handle)} \funcarg{\OUT}{newgroup}{ union group (handle)} \end{funcdef} \mpibind{MPI\_Group\_union(MPI\_Group~group1, MPI\_Group~group2, MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_UNION(GROUP1, GROUP2, NEWGROUP, IERROR)\fargs INTEGER GROUP1, GROUP2, NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Union(const~MPI::Group\&~group1, const~MPI::Group\&~group2)}{static MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{funcdef}{MPI\_GROUP\_INTERSECTION(group1, group2, newgroup)} \funcarg{\IN}{group1}{ first group (handle)} \funcarg{\IN}{group2}{ second group (handle)} \funcarg{\OUT}{newgroup}{ intersection group (handle)} \end{funcdef} \mpibind{MPI\_Group\_intersection(MPI\_Group~group1, MPI\_Group~group2, MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_INTERSECTION(GROUP1, GROUP2, NEWGROUP, IERROR)\fargs INTEGER GROUP1, GROUP2, NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Intersect(const~MPI::Group\&~group1, const~MPI::Group\&~group2)}{static MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{funcdef}{MPI\_GROUP\_DIFFERENCE(group1, group2, newgroup)} \funcarg{\IN}{group1}{ first group (handle)} \funcarg{\IN}{group2}{ second group (handle)} \funcarg{\OUT}{newgroup}{ difference group (handle)} \end{funcdef} \mpibind{MPI\_Group\_difference(MPI\_Group~group1, MPI\_Group~group2, MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_DIFFERENCE(GROUP1, GROUP2, NEWGROUP, IERROR)\fargs INTEGER GROUP1, GROUP2, NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Difference(const~MPI::Group\&~group1, const~MPI::Group\&~group2)}{static MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent The set-like operations are defined as follows: \begin{description} \item[union] All elements of the first group (\mpiarg{group1}), followed by all elements of second group (\mpiarg{group2}) not in first. \item[intersect] all elements of the first group that are also in the second group, ordered as in first group. \item[difference] all elements of the first group that are not in the second group, ordered as in the first group. \end{description} Note that for these operations the order of processes in the output group is determined primarily by order in the first group (if possible) and then, if necessary, by order in the second group. Neither union nor intersection are commutative, but both are associative. The new group can be empty, that is, equal to \const{MPI\_GROUP\_EMPTY}. \begin{funcdef}{MPI\_GROUP\_INCL(group, n, ranks, newgroup)} \funcarg{\IN}{group}{ group (handle)} \funcarg{\IN}{n}{ number of elements in array ranks (and size of \mpiarg{newgroup}) (integer)} \funcarg{\IN}{ranks}{ ranks of processes in \mpiarg{group} to appear in \mpiarg{newgroup} (array of integers)} \funcarg{\OUT}{newgroup}{ new group derived from above, in the order defined by \mpiarg{ ranks} (handle)} \end{funcdef} \mpibind{MPI\_Group\_incl(MPI\_Group~group, int~n, int~*ranks, MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_INCL(GROUP, N, RANKS, NEWGROUP, IERROR)\fargs INTEGER GROUP, N, RANKS(*), NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Incl(int~n, const~int~ranks[]) const}{MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 The function \func{MPI\_GROUP\_INCL} creates a group \mpiarg{newgroup} that consists of the \mpiarg{n} processes in \mpiarg{group} with ranks \mpiarg{rank[0],$\ldots$, rank[n-1]}; the process with rank \mpiarg{i} in \mpiarg{newgroup} is the process with rank \mpiarg{ranks[i]} in \mpiarg{group}. Each of the \mpiarg{n} elements of \mpiarg{ranks} must be a valid rank in \mpiarg{group} and all elements must be distinct, or else the program is erroneous. If \mpiarg{n}$~=~0$, then \mpiarg{newgroup} is \const{MPI\_GROUP\_EMPTY}. This function can, for instance, be used to reorder the elements of a group. See also \func{MPI\_GROUP\_COMPARE}. \begin{funcdef}{MPI\_GROUP\_EXCL(group, n, ranks, newgroup)} \funcarg{\IN}{group}{ group (handle)} \funcarg{\IN}{n}{ number of elements in array ranks (integer)} \funcarg{\IN}{ranks}{ array of integer ranks in \mpiarg{group} not to appear in \mpiarg{newgroup}} \funcarg{\OUT}{newgroup}{ new group derived from above, preserving the order defined by \mpiarg{ group} (handle)} \end{funcdef} \mpibind{MPI\_Group\_excl(MPI\_Group~group, int~n, int~*ranks, MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_EXCL(GROUP, N, RANKS, NEWGROUP, IERROR)\fargs INTEGER GROUP, N, RANKS(*), NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Excl(int~n, const~int~ranks[]) const}{MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 The function \func{MPI\_GROUP\_EXCL} creates a group of processes \mpiarg{newgroup} that is obtained by deleting from \mpiarg{group} those processes with ranks \mpiarg{ranks[0] ,$\ldots$ ranks[n-1]}. The ordering of processes in \mpiarg{newgroup} is identical to the ordering in \mpiarg{group}. Each of the \mpiarg{n} elements of \mpiarg{ranks} must be a valid rank in \mpiarg{group} and all elements must be distinct; otherwise, the program is erroneous. If \mpiarg{n}$~=~0$, then \mpiarg{newgroup} is identical to \mpiarg{group}. \begin{funcdef}{MPI\_GROUP\_RANGE\_INCL(group, n, ranges, newgroup)} \funcarg{\IN}{group}{ group (handle)} \funcarg{\IN}{n}{ number of triplets in array \mpiarg{ranges} (integer) } %MPI-1.2 \funcarg{\IN}{ranges}{ a \ADD{MPI-2, p.\ 31}{one-dimensional} array of integer triplets, of the form (first rank, last rank, stride) indicating ranks in \mpiarg{group} of processes to be included in \mpiarg{newgroup}} \funcarg{\OUT}{newgroup}{ new group derived from above, in the order defined by \mpiarg{ranges} (handle) } \end{funcdef} \mpibind{MPI\_Group\_range\_incl(MPI\_Group~group, int~n, int~ranges[][3], MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_RANGE\_INCL(GROUP, N, RANGES, NEWGROUP, IERROR)\fargs INTEGER GROUP, N, RANGES(3,*), NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Range\_incl(int~n, const~int~ranges[][3]) const}{MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent If \mpiarg{ ranges} consist of the triplets \[ (first_1 , last_1, stride_1) , ..., (first_n, last_n, stride_n) \] then \mpiarg{newgroup} consists of the sequence of processes in \mpiarg{group} with ranks \[ first_1 , first_1 + stride_1 , ... , first_1 + \left\lfloor \frac{last_1 - first_1}{stride_1} \right\rfloor stride_1 , ... \] \[ first_n , first_n + stride_n , ... , first_n + \left\lfloor \frac{last_n - first_n}{stride_n} \right\rfloor stride_n . \] Each computed rank must be a valid rank in \mpiarg{group} and all computed ranks must be distinct, or else the program is erroneous. Note that we may have $first_i > last_i$, and $stride_i$ may be negative, but cannot be zero. The functionality of this routine is specified to be equivalent to expanding the array of ranges to an array of the included ranks and passing the resulting array of ranks and other arguments to \func{MPI\_GROUP\_INCL}. A call to \func{MPI\_GROUP\_INCL} is equivalent to a call to \func{MPI\_GROUP\_RANGE\_INCL} with each rank \mpiarg{i} in \mpiarg{ranks} replaced by the triplet {\tt (i,i,1)} in the argument \mpiarg{ranges}. \begin{funcdef}{MPI\_GROUP\_RANGE\_EXCL(group, n, ranges, newgroup)} \funcarg{\IN}{group}{ group (handle)} %MPI-1.2 \funcarg{\IN}{n}{ number of elements in array \CHANGE{MPI-2, p.\ 32}{ranks}\INTO{ranges} (integer)} \funcarg{\IN}{ranges}{ a one-dimensional array of integer triplets of the form (first rank, last rank, stride), indicating the ranks in \mpiarg{group} of processes to be excluded from the output group \mpiarg{newgroup}. } \funcarg{\OUT}{newgroup}{ new group derived from above, preserving the order in \mpiarg{group} (handle)} \end{funcdef} \mpibind{MPI\_Group\_range\_excl(MPI\_Group~group, int~n, int~ranges[][3], MPI\_Group~*newgroup)} \mpifbind{MPI\_GROUP\_RANGE\_EXCL(GROUP, N, RANGES, NEWGROUP, IERROR)\fargs INTEGER GROUP, N, RANGES(3,*), NEWGROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Range\_excl(int~n, const~int~ranges[][3]) const}{MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent Each computed rank must be a valid rank in \mpiarg{group} and all computed ranks must be distinct, or else the program is erroneous. The functionality of this routine is specified to be equivalent to expanding the array of ranges to an array of the excluded ranks and passing the resulting array of ranks and other arguments to \func{MPI\_GROUP\_EXCL}. A call to \func{MPI\_GROUP\_EXCL} is equivalent to a call to \func{MPI\_GROUP\_RANGE\_EXCL} with each rank \mpiarg{i} in \mpiarg{ranks} replaced by the triplet {\tt (i,i,1)} in the argument \mpiarg{ranges}. \begin{users} The range operations do not explicitly enumerate ranks, and therefore are more scalable if implemented efficiently. Hence, we recommend \MPI/ programmers to use them whenenever possible, as high-quality implementations will take advantage of this fact. \end{users} \begin{implementors} The range operations should be implemented, if possible, without enumerating the group members, in order to obtain better scalability (time and space). \end{implementors} \subsection{Group Destructors} \label{subsec:context-grpdest} \begin{funcdef}{MPI\_GROUP\_FREE(group)} \funcarg{\INOUT}{group}{ group (handle)} \end{funcdef} \mpibind{MPI\_Group\_free(MPI\_Group~*group)} \mpifbind{MPI\_GROUP\_FREE(GROUP, IERROR)\fargs INTEGER GROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Group::Free()}{void} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 This operation marks a group object for deallocation. The handle \mpiarg{group} is set to \const{MPI\_GROUP\_NULL} by the call. Any on-going operation using this group will complete normally. \begin{implementors} One can keep a reference count that is incremented for each call to \func{MPI\_COMM\_CREATE} and \func{MPI\_COMM\_DUP}, and decremented for each call to \func{MPI\_GROUP\_FREE} or \func{MPI\_COMM\_FREE}; the group object is ultimately deallocated when the reference count drops to zero. \end{implementors} \section{Communicator Management} % Passed 21-0-1 This section describes the manipulation of communicators in \MPI/. Operations that access communicators are local and their execution does not require interprocess communication. Operations that create communicators are collective and may require interprocess communication. \begin{implementors} High-quality implementations should amortize the overheads associated with the creation of communicators (for the same group, or subsets thereof) over several calls, by allocating multiple contexts with one collective communication. \end{implementors} \subsection{Communicator Accessors} \label{subsec:context-intracommacc} The following are all local operations. \begin{funcdef}{MPI\_COMM\_SIZE(comm, size)} \funcarg{\IN}{comm}{ communicator (handle)} \funcarg{\OUT}{size}{ number of processes in the group of \mpiarg{ comm} (integer)} \end{funcdef} \mpibind{MPI\_Comm\_size(MPI\_Comm~comm, int~*size)} \mpifbind{MPI\_COMM\_SIZE(COMM, SIZE, IERROR)\fargs INTEGER COMM, SIZE, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Comm::Get\_size() const}{int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{rationale} This function is equivalent to accessing the communicator's group with \mpifunc{MPI\_COMM\_GROUP} (see above), computing the size using \mpifunc{MPI\_GROUP\_SIZE}, and then freeing the temporary group via \mpifunc{MPI\_GROUP\_FREE}. However, this function is so commonly used, that this shortcut was introduced. \end{rationale} \begin{users} This function indicates the number of processes involved in a communicator. For \const{MPI\_COMM\_WORLD}, it indicates the total number of processes available (for this version of \MPI/, there is no standard way to change the number of processes once initialization has taken place). This call is often used with the next call to determine the amount of concurrency available for a specific library or program. The following call, \mpifunc{MPI\_COMM\_RANK} indicates the rank of the process that calls it in the range from $0\ldots$\mpiarg{size}$-1$, where \mpiarg{size} is the return value of \mpifunc{MPI\_COMM\_SIZE}.\end{users} \begin{funcdef}{MPI\_COMM\_RANK(comm, rank)} \funcarg{\IN}{comm}{ communicator (handle)} \funcarg{\OUT}{rank}{ rank of the calling process in group of \mpiarg{ comm} (integer)} \end{funcdef} \mpibind{MPI\_Comm\_rank(MPI\_Comm~comm, int~*rank)} \mpifbind{MPI\_COMM\_RANK(COMM, RANK, IERROR)\fargs INTEGER COMM, RANK, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Comm::Get\_rank() const}{int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \snir \begin{rationale} This function is equivalent to accessing the communicator's group with \mpifunc{MPI\_COMM\_GROUP} (see above), computing the rank using \mpifunc{MPI\_GROUP\_RANK}, and then freeing the temporary group via \mpifunc{MPI\_GROUP\_FREE}. However, this function is so commonly used, that this shortcut was introduced. \end{rationale} \rins \begin{users} This function gives the rank of the process in the particular communicator's group. It is useful, as noted above, in conjunction with \mpifunc{MPI\_COMM\_SIZE}. Many programs will be written with the master-slave model, where one process (such as the rank-zero process) will play a supervisory role, and the other processes will serve as compute nodes. In this framework, the two preceding calls are useful for determining the roles of the various processes of a communicator. \end{users} % % WHERE MPI_COMM_GROUP USED TO BE % \begin{funcdef}{MPI\_COMM\_COMPARE(comm1, comm2, result)} \funcarg{\IN}{comm1}{ first communicator (handle)} \funcarg{\IN}{comm2}{ second communicator (handle)} \funcarg{\OUT}{result}{ result (integer)} \end{funcdef} \mpibind{MPI\_Comm\_compare(MPI\_Comm~comm1,MPI\_Comm~comm2,~int~*result)} \mpifbind{MPI\_COMM\_COMPARE(COMM1, COMM2, RESULT, IERROR)\fargs INTEGER COMM1, COMM2, RESULT, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Comm::Compare(const~MPI::Comm\&~comm1, const~MPI::Comm\&~comm2)}{static int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent\const{MPI\_IDENT} results if and only if \mpiarg{comm1} and \mpiarg{comm2} are handles for the same object (identical groups and same contexts). \const{MPI\_CONGRUENT} results if the underlying groups are identical in constituents and rank order; these communicators differ only by context. \const{MPI\_SIMILAR} results if the group members of both communicators are the same but the rank order differs. \const{MPI\_UNEQUAL} results otherwise. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.144 l.45 - p.145 l.5 , File 1.3/context.tex, lines 723-736 \subsection{Communicator Constructors} \label{subsec:context-intracomconst} The following are collective functions that are invoked by all processes in the \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % group associated with \mpiarg{comm}. group or groups associated with \mpiarg{comm}. \mpiiidotiMergeNEWforSINGLEendII% MPI-2.1 round-two - end of modification \begin{rationale} Note that there is a chicken-and-egg aspect to \MPI/ in that a communicator is needed to create a new communicator. The base communicator for all \MPI/ communicators is predefined outside of \MPI/, and is \const{MPI\_COMM\_WORLD}. This model was arrived at after considerable debate, and was chosen to increase ``safety'' of programs written in \MPI/. \end{rationale} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Chap. 7, p.145 l.25-42, File 2.0/collective-2.tex, lines 104-137 \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification The \MPI/ interface provides four communicator construction routines that apply to both intracommunicators and intercommunicators. The construction routine \mpifunc{MPI\_INTERCOMM\_CREATE} (discussed later) applies only to intercommunicators. % \subsubsection{Intercommunicator Constructors} % \label{sec:MPI-const} % % \status{Passed twice} % % The current \MPI/ interface provides only two intercommunicator % construction routines: % \begin{itemize} % \item \mpifunc{MPI\_COMM\_SPLIT}, % creates an intercommunicator from two intracommunicators, % \item \mpifunc{MPI\_INTERCOMM\_CREATE}, % duplicates an existing intercommunicator (or intracommunicator). % \end{itemize} % % \noindent % \mpiiidotiMergeFromREVIEWbegin{9.f}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % % The % In \MPII/, the % \mpiiidotiMergeFromREVIEWendII{9.f}% MPI-2.1 End of review based correction % other communicator constructors, \mpifunc{MPI\_COMM\_CREATE} and % \mpifunc{MPI\_COMM\_SPLIT}, % \mpiiidotiMergeFromREVIEWbegin{9.f}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % % currently apply % applied % \mpiiidotiMergeFromREVIEWendII{9.f}% MPI-2.1 End of review based correction % only to intracommunicators. % These operations in fact have well-defined semantics for % intercommunicators \cite{skjellum.doss.viswanathan.94}. % %One other intercommunicator constructor for % %partitioning intracommunicators into multiple intercommunicators is % %also proposed. % In the following discussions, the two groups in an intercommunicator are An intracommunicator involves a single group while an intercommunicator involves two groups. Where the following discussions address intercommunicator semantics, the two groups in an intercommunicator are \mpiiidotiMergeNEWforSINGLEendII% MPI-2.1 round-two - end of modification called the {\em left} and {\em right} groups. A process in an intercommunicator is a member of either the left or the right group. From the point of view of that process, the group that the process is a member of is called the {\em local} group; the other group (relative to that process) is the {\em remote} group. The left and right group labels give us a way to describe the two groups in an intercommunicator that is not relative to any particular process (as the local and remote groups are). \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.145 l.8 - p.145 l.37, File 1.3/context.tex, lines 737-774 \begin{funcdef}{MPI\_COMM\_DUP(comm, newcomm)} \funcarg{\IN}{comm}{ communicator (handle)} \funcarg{\OUT}{newcomm}{ copy of \mpiarg{comm} (handle)} \end{funcdef} \mpibind{MPI\_Comm\_dup(MPI\_Comm~comm, MPI\_Comm~*newcomm)} \mpifbind{MPI\_COMM\_DUP(COMM, NEWCOMM, IERROR)\fargs INTEGER COMM, NEWCOMM, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Intracomm::Dup() const}{MPI::Intracomm} \mpicppemptybind{MPI::Intercomm::Dup() const}{MPI::Intercomm} \mpicppemptybind{MPI::Cartcomm::Dup() const}{MPI::Cartcomm} \mpicppemptybind{MPI::Graphcomm::Dup() const}{MPI::Graphcomm} \mpicppemptybind{MPI::Comm::Clone() const = 0}{MPI::Comm\&} \mpicppemptybind{MPI::Intracomm::Clone() const}{MPI::Intracomm\&} \mpicppemptybind{MPI::Intercomm::Clone() const}{MPI::Intercomm\&} \mpicppemptybind{MPI::Cartcomm::Clone() const}{MPI::Cartcomm\&} \mpicppemptybind{MPI::Graphcomm::Clone() const}{MPI::Graphcomm\&} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \func{MPI\_COMM\_DUP} Duplicates the existing communicator \mpiarg{comm} with associated key values. For each key value, the respective copy callback function determines the attribute value associated with this key in the new communicator; one particular action that a copy callback may take is to delete the attribute from the new communicator. Returns in \mpiarg{newcomm} a new \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % communicator with the same group, any copied cached information, communicator with the same group or groups, any copied cached information, \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification but a new context (see section~\ref{subsec:context-cachefunc}). \begin{users} This operation is used to provide a parallel library call with a duplicate communication space that has the same properties as the original communicator. This includes any attributes (see below), and topologies (see chapter~\ref{chap:topol}). This call is valid even if there are pending point-to-point communications involving the communicator \mpiarg{comm}. A typical call might involve a \func{MPI\_COMM\_DUP} at the beginning of the parallel call, and an \func{MPI\_COMM\_FREE} of that duplicated communicator at the end of the call. Other models of communicator management are also possible. This call applies to both intra- and inter-communicators. \end{users} \begin{implementors} One need not actually copy the group information, but only add a new reference and increment the reference count. Copy on write can be used for the cached information.\end{implementors} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.145 l.38 - p.146 l.1 , File 1.3/context.tex, lines 775-785 \begin{funcdef}{MPI\_COMM\_CREATE(comm, group, newcomm)} \funcarg{\IN}{comm}{communicator (handle)} \funcarg{\IN}{group}{ Group, which is a subset of the group of \mpiarg{comm} (handle)} \funcarg{\OUT}{newcomm}{ new communicator (handle)} \end{funcdef} \mpibind{MPI\_Comm\_create(MPI\_Comm~comm, MPI\_Group~group, MPI\_Comm~*newcomm)} \mpifbind{MPI\_COMM\_CREATE(COMM, GROUP, NEWCOMM, IERROR)\fargs INTEGER COMM, GROUP, NEWCOMM, IERROR} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-2.0, Chap. 7, p.146 l.1-7, File 2.0/collective-2.tex, lines 146-152 (same as MPI-1.3 (but argument names different)) \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Chap. 7, p.146 l.8-10, File 2.0/collective-2.tex, lines 153-157 \mpiiidotiMergeFromREVIEWbegin{22.f}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 \mpicppemptybind{MPI::Intercomm::Create(const MPI::Group\&~group) const}{MPI::Intercomm} \begchangefiniii \mpicppemptybind{MPI::Intracomm::Create(const MPI::Group\& group) const}{MPI::Intracomm} \endchangefiniii \mpiiidotiMergeFromREVIEWendII{22.f}% MPI-2.1 End of review based correction \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-2.0, Chap. 7, p.146 l.11-12, File 2.0/collective-2.tex, lines 158-160 (obsolete) \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.146 l.2 - p.146 l.7 , File 1.3/context.tex, lines 786-794 \noindent If \mpiarg{comm} is an intra-communicator, this function creates a new communicator \mpiarg{newcomm} with communication group defined by \mpiarg{group} and a new context. No cached information propagates from \mpiarg{comm} to \mpiarg{newcomm}. The function returns \const{MPI\_COMM\_NULL} to processes that are not in \mpiarg{group}. The call is erroneous if not all \mpiarg{group} arguments have the same value, or if \mpiarg{group} is not a subset of the group associated with \mpiarg{comm}. Note that the call is to be executed by all processes in \mpiarg{comm}, even if they do not belong to the new group. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-1.1, Chap. 5, p.146 l.8 , File 1.3/context.tex, lines 795-795 (obsolete) \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.146 l.9 - p.146 l.37, File 1.3/context.tex, lines 796-838 \begin{rationale} The requirement that the entire group of \mpiarg{comm} participate in the call stems from the following considerations: \begin{itemize} \item It allows the implementation to layer \mpifunc{MPI\_COMM\_CREATE} on top of regular collective communications. \item It provides additional safety, in particular in the case where partially overlapping groups are used to create new communicators. \item It permits implementations sometimes to avoid communication related to context creation. \end{itemize} \end{rationale} \begin{users} \func{MPI\_COMM\_CREATE} provides a means to subset a group of processes for the purpose of separate MIMD computation, with separate communication space. \mpiarg{newcomm}, which emerges from \func{MPI\_COMM\_CREATE} can be used in subsequent calls to \func{MPI\_COMM\_CREATE} (or other communicator constructors) further to subdivide a computation into parallel sub-computations. A more general service is provided by \func{MPI\_COMM\_SPLIT}, below. \end{users} \begin{implementors} Since all processes calling \mpifunc{MPI\_COMM\_DUP} or \linebreak \mpifunc{MPI\_COMM\_CREATE} provide the same \mpiarg{group} argument, it is theoretically possible to agree on a group-wide unique context with no communication. However, local execution of these functions requires use of a larger context name space and reduces error checking. Implementations may strike various compromises between these conflicting goals, such as bulk allocation of multiple contexts in one collective operation. Important: If new communicators are created without synchronizing the processes involved then the communication system should be able to cope with messages arriving in a context that has not yet been allocated at the receiving process. \end{implementors} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Chap. 7, p.146 l.13 - p.147 l.24, File 2.0/collective-2.tex, lines 161-223 \noindent If \mpiarg{comm} is an intercommunicator, then the output communicator is also an intercommunicator where the local group consists only of those processes contained in \mpiarg{group} (see Figure~\ref{fig:collective-create}). The \mpiarg{group} argument should only contain those processes in the local group of the input intercommunicator that are to be a part of \mpiarg{newcomm}. If either \mpiarg{group} does not specify at least one process in the local group of the intercommunicator, or if the calling process is not included in the \mpiarg{group}, \consti{MPI\_COMM\_NULL} is returned. \begin{rationale} In the case where either the left or right group is empty, a null communicator is returned instead of an intercommunicator with \consti{MPI\_GROUP\_EMPTY} because the side with the empty group must return \consti{MPI\_COMM\_NULL}. \end{rationale} %\discuss{In the case where either the left or right group is {\em not} empty, % why not an intercommunicator with \consti{MPI\_GROUP\_EMPTY}? Does % anyone remember the rationale? Is it just useless?} \begin{figure}[htbp] \centerline{\includegraphics[width=4.0in]{figures/collective-create}} \caption[Intercommunicator create using \mpiskipfunc{MPI\_COMM\_CREATE}]{Intercommunicator create using \mpifunc{MPI\_COMM\_CREATE} extended to intercommunicators. The input groups are those in the grey circle.} \label{fig:collective-create} \end{figure} \begin{example} The following example illustrates how the first node in the left side of an intercommunicator could be joined with all members on the right side of an intercommunicator to form a new intercommunicator. \exindex{MPI\_Comm\_create} \exindex{MPI\_Comm\_group} \exindex{MPI\_Group\_incl} \exindex{MPI\_Group\_free} \exindex{Intercommunicator} %%HEADER %%LANG: C %%FRAGMENT %%SKIPELIPSIS %%SUBST:/\* I'm on the left side of the intercommunicator \*/:1 %%ENDHEADER \begin{verbatim} MPI_Comm inter_comm, new_inter_comm; MPI_Group local_group, group; int rank = 0; /* rank on left side to include in new inter-comm */ /* Construct the original intercommunicator: "inter_comm" */ ... /* Construct the group of processes to be in new intercommunicator */ if (/* I'm on the left side of the intercommunicator */) { MPI_Comm_group ( inter_comm, &local_group ); MPI_Group_incl ( local_group, 1, &rank, &group ); MPI_Group_free ( &local_group ); } else MPI_Comm_group ( inter_comm, &group ); MPI_Comm_create ( inter_comm, group, &new_inter_comm ); MPI_Group_free( &group ); \end{verbatim} \end{example} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.146 l.40 - p.147 l.2 , File 1.3/context.tex, lines 839-849 \begin{funcdef}{MPI\_COMM\_SPLIT(comm, color, key, newcomm)} \funcarg{\IN}{comm}{communicator (handle)} \funcarg{\IN}{color}{control of subset assignment (integer)} \funcarg{\IN}{key}{ control of rank assigment (integer)} \funcarg{\OUT}{newcomm}{ new communicator (handle)} \end{funcdef} \mpibind{MPI\_Comm\_split(MPI\_Comm~comm, int~color, int~key, MPI\_Comm~*newcomm)} \mpifbind{MPI\_COMM\_SPLIT(COMM, COLOR, KEY, NEWCOMM, IERROR)\fargs INTEGER COMM, COLOR, KEY, NEWCOMM, IERROR} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-2.0, Chap. 7, p.147 l.27-33, File 2.0/collective-2.tex, lines 224-230 (same as MPI-1.3 (but argument names different)) \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Chap. 7, p.147 l.34-36, File 2.0/collective-2.tex, lines 231-235 \mpicppemptybind{MPI::Intercomm::Split(int color, int key) const}{MPI::Intercomm} \begchangefiniii \mpicppemptybind{MPI::Intracomm::Split(int color, int key) const}{MPI::Intracomm} \endchangefiniii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-2.0, Chap. 7, p.147 l.37-38, File 2.0/collective-2.tex, lines 236-238 (obsolete) \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.147 l.3 - p.147 l.14, File 1.3/context.tex, lines 850-872 \noindent This function partitions the group associated with \mpiarg{comm} into disjoint subgroups, one for each value of \mpiarg{color}. Each subgroup contains all processes of the same color. Within each subgroup, the processes are ranked in the order defined by the value of the argument \mpiarg{key}, with ties broken according to their rank in the old group. A new communicator is created for each subgroup and returned in \mpiarg{newcomm}. A process may supply the color value \mpiarg{MPI\_UNDEFINED}, in which case \mpiarg{newcomm} returns \const{MPI\_COMM\_NULL}. This is a collective call, but each process is permitted to provide different values for \mpiarg{color} and \mpiarg{key}. A call to \func{MPI\_COMM\_CREATE(comm, group, newcomm)} is equivalent to \linebreak a call to \func{MPI\_COMM\_SPLIT(comm, color, key, newcomm)}, where all members of \mpiarg{group} provide \mpiarg{color}$~ =~0$ and \mpiarg{key}$~=~$ rank in \mpiarg{group}, and all processes that are not members of \mpiarg{group} provide \mpiarg{color}$~ =~$ \mpiarg{MPI\_UNDEFINED}. The function \func{MPI\_COMM\_SPLIT} allows more general partitioning of a group into one or more subgroups with optional reordering. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-1.1, Chap. 5, p.147 l.14-15 , File 1.3/context.tex, lines 873-873 (obsolete) \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.147 l.16 - p.147 l.38, File 1.3/context.tex, lines 874-909 \snir The value of \mpiarg{color} must be nonnegative. \rins \begin{users} This is an extremely powerful mechanism for dividing a single communicating group of processes into $k$ subgroups, with $k$ chosen implicitly by the user (by the number of colors asserted over all the processes). Each resulting communicator will be non-overlapping. Such a division could be useful for defining a hierarchy of computations, such as for multigrid, or linear algebra. Multiple calls to \func{MPI\_COMM\_SPLIT} can be used to overcome the requirement that any call have no overlap of the resulting communicators (each process is of only one color per call). In this way, multiple overlapping communication structures can be created. Creative use of the \mpiarg{color} and \mpiarg{key} in such splitting operations is encouraged. Note that, for a fixed color, the keys need not be unique. It is \func{MPI\_COMM\_SPLIT}'s responsibility to sort processes in ascending order according to this key, and to break ties in a consistent way. If all the keys are specified in the same way, then all the processes in a given color will have the relative rank order as they did in their parent group. (In general, they will have different ranks.) Essentially, making the key value zero for all processes of a given color means that one doesn't really care about the rank-order of the processes in the new communicator. \end{users} \snir \begin{rationale} \mpiarg{color} is restricted to be nonnegative, so as not to confict with the value assigned to \const{MPI\_UNDEFINED}. \end{rationale} \rins \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Chap. 7, p.147 l.39 - p.149 l.35, File 2.0/collective-2.tex, lines 239-315 \noindent The result of \mpifunc{MPI\_COMM\_SPLIT} on an intercommunicator is that those processes on the left with the same \mpiarg{color} as those processes on the right combine to create a new intercommunicator. The \mpiarg{key} argument describes the relative rank of processes on each side of the intercommunicator (see Figure~\ref{fig:collective-split}). For those colors that are specified only on one side of the intercommunicator, \consti{MPI\_COMM\_NULL} is returned. \consti{MPI\_COMM\_NULL} is also returned to those processes that specify \consti{MPI\_UNDEFINED} as the color. %\discuss{In the case where either the left or right group is {\em not} empty, % why not an intercommunicator with \consti{MPI\_GROUP\_EMPTY}? Does % anyone remember the rationale? Is it just useless?} \begin{figure}[htbp] \centerline{\includegraphics[width=4.0in]{figures/collective-split2}} \caption[Intercommunicator constructionwith \mpiskipfunc{MPI\_COMM\_SPLIT}]{Intercommunicator construction achieved by splitting an existing intercommunicator with \mpifunc{MPI\_COMM\_SPLIT} extended to intercommunicators.} \label{fig:collective-split} \end{figure} \begin{example}(Parallel client-server model). The following client code illustrates how clients on the left side of an intercommunicator could be assigned to a single server from a pool of servers on the right side of an intercommunicator. \exindex{MPI\_Comm\_split} \exindex{MPI\_Comm\_remote\_size} \exindex{Intercommunicator} %%HEADER %%LANG: C %%FRAGMENT %%SKIPELIPSIS %%ENDHEADER \begin{verbatim} /* Client code */ MPI_Comm multiple_server_comm; MPI_Comm single_server_comm; int color, rank, num_servers; /* Create intercommunicator with clients and servers: multiple_server_comm */ ... /* Find out the number of servers available */ MPI_Comm_remote_size ( multiple_server_comm, &num_servers ); /* Determine my color */ MPI_Comm_rank ( multiple_server_comm, &rank ); color = rank % num_servers; /* Split the intercommunicator */ MPI_Comm_split ( multiple_server_comm, color, rank, &single_server_comm ); \end{verbatim} \noindent The following is the corresponding server code: %%HEADER %%LANG: C %%FRAGMENT %%SKIPELIPSIS %%ENDHEADER \begin{verbatim} /* Server code */ MPI_Comm multiple_client_comm; MPI_Comm single_server_comm; int rank; /* Create intercommunicator with clients and servers: multiple_client_comm */ ... /* Split the intercommunicator for a single server per group of clients */ MPI_Comm_rank ( multiple_client_comm, &rank ); MPI_Comm_split ( multiple_client_comm, rank, 0, &single_server_comm ); \end{verbatim} \end{example} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.147 l.40 - p.167 l.34, File 1.3/context.tex, lines 910-2065 \subsection{Communicator Destructors} \label{subsec:context-intracomdest} \begin{funcdef}{MPI\_COMM\_FREE(comm)} \funcarg{\INOUT}{comm}{ communicator to be destroyed (handle)} \end{funcdef} \mpibind{MPI\_Comm\_free(MPI\_Comm~*comm)} \mpifbind{MPI\_COMM\_FREE(COMM, IERROR)\fargs INTEGER COMM, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Comm::Free()}{void} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 This collective operation marks the communication object for deallocation. The handle is set to \const{MPI\_COMM\_NULL}. Any pending operations that use this communicator will complete normally; the object is actually deallocated only if there are no other active references to it. This call applies to intra- and inter-communicators. The delete callback functions for all cached attributes (see section~\ref{sec:caching}) are called in arbitrary order. \begin{implementors} A reference-count mechanism may be used: the reference count is incremented by each call to \func{MPI\_COMM\_DUP}, and decremented by each call to \func{MPI\_COMM\_FREE}. The object is ultimately deallocated when the count reaches zero. Though collective, it is anticipated that this operation will normally be implemented to be local, though the debugging version of an \MPI/ library might choose to synchronize. \end{implementors} %---------------------------------------------------------------------- \section{Motivating Examples} \subsection{Current Practice \#1} \label{context-ex1} \noindent Example \#1a: \begin{verbatim} main(int argc, char **argv) { int me, size; ... MPI_Init ( &argc, &argv ); MPI_Comm_rank (MPI_COMM_WORLD, &me); MPI_Comm_size (MPI_COMM_WORLD, &size); (void)printf ("Process %d size %d\n", me, size); ... MPI_Finalize(); } \end{verbatim} Example \#1a is a do-nothing program that initializes itself legally, %MPI-1.2-review-2008.03.13 and refers to the\DELETE{MPI-1.2-review-Rainer-2008.03.13}{ the} ``all'' communicator, and prints a message. It terminates itself legally too. This example does not imply that \MPI/ supports {\tt printf}-like communication itself. \noindent Example \#1b (supposing that {\tt size} is even): \begin{verbatim} main(int argc, char **argv) { int me, size; int SOME_TAG = 0; ... MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &me); /* local */ MPI_Comm_size(MPI_COMM_WORLD, &size); /* local */ if((me % 2) == 0) { /* send unless highest-numbered process */ if((me + 1) < size) MPI_Send(..., me + 1, SOME_TAG, MPI_COMM_WORLD); } else MPI_Recv(..., me - 1, SOME_TAG, MPI_COMM_WORLD); ... MPI_Finalize(); } \end{verbatim} Example \#1b schematically illustrates message exchanges between ``even'' and ``odd'' processes in the ``all'' communicator. \subsection{Current Practice \#2} \label{context-ex2} \begin{verbatim} main(int argc, char **argv) { int me, count; void *data; ... MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &me); if(me == 0) { /* get input, create buffer ``data'' */ ... } MPI_Bcast(data, count, MPI_BYTE, 0, MPI_COMM_WORLD); ... MPI_Finalize(); } \end{verbatim} This example illustrates the use of a collective communication. \subsection{(Approximate) Current Practice \#3} \label{context-ex3} \begin{verbatim} main(int argc, char **argv) { int me, count, count2; void *send_buf, *recv_buf, *send_buf2, *recv_buf2; MPI_Group MPI_GROUP_WORLD, grprem; MPI_Comm commslave; static int ranks[] = {0}; ... MPI_Init(&argc, &argv); MPI_Comm_group(MPI_COMM_WORLD, &MPI_GROUP_WORLD); MPI_Comm_rank(MPI_COMM_WORLD, &me); /* local */ MPI_Group_excl(MPI_GROUP_WORLD, 1, ranks, &grprem); /* local */ MPI_Comm_create(MPI_COMM_WORLD, grprem, &commslave); if(me != 0) { /* compute on slave */ ... MPI_Reduce(send_buf,recv_buff,count, MPI_INT, MPI_SUM, 1, commslave); ... } /* zero falls through immediately to this reduce, others do later... */ MPI_Reduce(send_buf2, recv_buff2, count2, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); MPI_Comm_free(&commslave); MPI_Group_free(&MPI_GROUP_WORLD); MPI_Group_free(&grprem); MPI_Finalize(); } \end{verbatim} This example illustrates how a group consisting of all but the zeroth process of the ``all'' group is created, and then how a communicator is formed (\mpiarg{ commslave}) for that new group. The new communicator is used in a collective call, and all processes execute a collective call in the \const{ MPI\_COMM\_WORLD} context. This example illustrates how the two communicators (that inherently possess distinct contexts) protect communication. That is, communication in \const{ MPI\_COMM\_WORLD} is insulated from communication in \mpiarg{ commslave}, and vice versa. In summary, ``group safety'' is achieved via communicators because distinct contexts within communicators are enforced to be unique on any process. \subsection{Example \#4} \label{context-ex4} The following example is meant to illustrate ``safety'' between point-to-point and collective communication. \MPI/ guarantees that a single communicator can do safe point-to-point and collective communication. \begin{verbatim} #define TAG_ARBITRARY 12345 #define SOME_COUNT 50 main(int argc, char **argv) { int me; MPI_Request request[2]; MPI_Status status[2]; MPI_Group MPI_GROUP_WORLD, subgroup; int ranks[] = {2, 4, 6, 8}; MPI_Comm the_comm; ... MPI_Init(&argc, &argv); MPI_Comm_group(MPI_COMM_WORLD, &MPI_GROUP_WORLD); MPI_Group_incl(MPI_GROUP_WORLD, 4, ranks, &subgroup); /* local */ MPI_Group_rank(subgroup, &me); /* local */ MPI_Comm_create(MPI_COMM_WORLD, subgroup, &the_comm); if(me != MPI_UNDEFINED) { MPI_Irecv(buff1, count, MPI_DOUBLE, MPI_ANY_SOURCE, TAG_ARBITRARY, the_comm, request); MPI_Isend(buff2, count, MPI_DOUBLE, (me+1)%4, TAG_ARBITRARY, the_comm, request+1); } for(i = 0; i < SOME_COUNT, i++) MPI_Reduce(..., the_comm); MPI_Waitall(2, request, status); MPI_Comm_free(t&he_comm); MPI_Group_free(&MPI_GROUP_WORLD); MPI_Group_free(&subgroup); MPI_Finalize(); } \end{verbatim} \subsection{Library Example \#1} \label{context-ex5} The main program: \begin{verbatim} main(int argc, char **argv) { int done = 0; user_lib_t *libh_a, *libh_b; void *dataset1, *dataset2; ... MPI_Init(&argc, &argv); ... init_user_lib(MPI_COMM_WORLD, &libh_a); init_user_lib(MPI_COMM_WORLD, &libh_b); ... user_start_op(libh_a, dataset1); user_start_op(libh_b, dataset2); ... while(!done) { /* work */ ... MPI_Reduce(..., MPI_COMM_WORLD); ... /* see if done */ ... } user_end_op(libh_a); user_end_op(libh_b); uninit_user_lib(libh_a); uninit_user_lib(libh_b); MPI_Finalize(); } \end{verbatim} \noindent The user library initialization code: \begin{verbatim} void init_user_lib(MPI_Comm comm, user_lib_t **handle) { user_lib_t *save; user_lib_initsave(&save); /* local */ MPI_Comm_dup(comm, &(save -> comm)); /* other inits */ ... *handle = save; } \end{verbatim} \noindent User start-up code: \begin{verbatim} void user_start_op(user_lib_t *handle, void *data) { MPI_Irecv( ..., handle->comm, &(handle -> irecv_handle) ); MPI_Isend( ..., handle->comm, &(handle -> isend_handle) ); } \end{verbatim} \noindent User communication clean-up code: \begin{verbatim} void user_end_op(user_lib_t *handle) { MPI_Status *status; MPI_Wait(handle -> isend_handle, status); MPI_Wait(handle -> irecv_handle, status); } \end{verbatim} \noindent User object clean-up code: \begin{verbatim} void uninit_user_lib(user_lib_t *handle) { MPI_Comm_free(&(handle -> comm)); free(handle); } \end{verbatim} \subsection{Library Example \#2} \label{context-ex6} The main program: %MPI-1.2 \CHANGE{Errata for MPI-1.1, p. 6, l. 30-38}{Replace \mpiarg{comm\_a} by \mpiarg{comm\_b} in check} \begin{verbatim} main(int argc, char **argv) { int ma, mb; MPI_Group MPI_GROUP_WORLD, group_a, group_b; MPI_Comm comm_a, comm_b; static int list_a[] = {0, 1}; #if defined(EXAMPLE_2B) | defined(EXAMPLE_2C) static int list_b[] = {0, 2 ,3}; #else/* EXAMPLE_2A */ static int list_b[] = {0, 2}; #endif int size_list_a = sizeof(list_a)/sizeof(int); int size_list_b = sizeof(list_b)/sizeof(int); ... MPI_Init(&argc, &argv); MPI_Comm_group(MPI_COMM_WORLD, &MPI_GROUP_WORLD); MPI_Group_incl(MPI_GROUP_WORLD, size_list_a, list_a, &group_a); MPI_Group_incl(MPI_GROUP_WORLD, size_list_b, list_b, &group_b); MPI_Comm_create(MPI_COMM_WORLD, group_a, &comm_a); MPI_Comm_create(MPI_COMM_WORLD, group_b, &comm_b); if(comm_a != MPI_COMM_NULL) MPI_Comm_rank(comm_a, &ma); if(comm_b != MPI_COMM_NULL) MPI_Comm_rank(comm_b, &mb); if(comm_a != MPI_COMM_NULL) lib_call(comm_a); if(comm_b != MPI_COMM_NULL) { lib_call(comm_b); lib_call(comm_b); } if(comm_a != MPI_COMM_NULL) MPI_Comm_free(&comm_a); if(comm_b != MPI_COMM_NULL) MPI_Comm_free(&comm_b); MPI_Group_free(&group_a); MPI_Group_free(&group_b); MPI_Group_free(&MPI_GROUP_WORLD); MPI_Finalize(); } \end{verbatim} \noindent The library: \begin{verbatim} void lib_call(MPI_Comm comm) { int me, done = 0; MPI_Comm_rank(comm, &me); if(me == 0) while(!done) { MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, comm); ... } else { /* work */ MPI_Send(..., 0, ARBITRARY_TAG, comm); .... } #ifdef EXAMPLE_2C /* include (resp, exclude) for safety (resp, no safety): */ MPI_Barrier(comm); #endif } \end{verbatim} The above example is really three examples, depending on whether or not one includes rank 3 in \mpiarg{list\_b}, and whether or not a synchronize is included in \mpiskipfunc{lib\_call}. This example illustrates that, despite contexts, subsequent calls to \mpiskipfunc{lib\_call} with the same context need not be safe from one another (colloquially, ``back-masking''). Safety is realized if the \func{ MPI\_Barrier} is added. What this demonstrates is that libraries have to be written carefully, even with contexts. When rank 3 is excluded, then the synchronize is not needed to get safety from back masking. Algorithms like ``reduce'' and ``allreduce'' have strong enough source selectivity properties so that they are inherently okay (no backmasking), provided that \MPI/ provides basic guarantees. So are multiple calls to a typical tree-broadcast algorithm with the same root or different roots (see \cite{Skj91rev}). Here we rely on two guarantees of \MPI/: pairwise ordering of messages between processes in the same context, and source selectivity --- deleting either feature removes the guarantee that backmasking cannot be required. Algorithms that try to do non-deterministic broadcasts or other calls that include wildcard operations will not generally have the good properties of the deterministic implementations of ``reduce,'' ``allreduce,'' and ``broadcast.'' Such algorithms would have to utilize the monotonically increasing tags (within a communicator scope) to keep things straight. All of the foregoing is a supposition of ``collective calls'' implemented with point-to-point operations. \MPI/ implementations may or may not implement collective calls using point-to-point operations. These algorithms are used to illustrate the issues of correctness and safety, independent of how \MPI/ implements its collective calls. See also section~\ref{sec:formalizing}. %---------------------------------------------------------------------- \section{Inter-Communication} % Passed 20-1-1 This section introduces the concept of int\-er-com\-mun\-i\-cat\-ion and describes the portions of \MPI/ that support it. It describes support for writing programs that contain user-level servers. \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % All point-to-point communication described thus far has involved All communication described thus far has involved \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification communication between processes that are members of the same group. This type of communication is called ``int\-ra-com\-mun\-i\-cat\-ion'' and the communicator used is called an ``intra-communicator,'' as we have noted earlier in the chapter. In modular and multi-disciplinary applications, different process groups execute distinct modules and processes within different modules communicate with one another in a pipeline or a more general module graph. In these applications, the most natural way for a process to specify a target process is by the rank of the target process within the target group. In applications that contain internal user-level servers, each server may be a process group that provides services to one or more clients, and each client may be a process group that uses the services of one or more servers. It is again most natural to specify the target process by rank within the target group in these applications. This type of communication is called ``int\-er-com\-mun\-i\-cat\-ion'' and the communicator used is called an ``inter-communicator,'' as introduced earlier. An int\-er-com\-mun\-i\-cat\-ion is a point-to-point communication between processes in different groups. The group containing a process that initiates an int\-er-com\-mun\-i\-cat\-ion operation is called the ``local group,'' that is, the sender in a send and the receiver in a receive. The group containing the target process is called the ``remote group,'' that is, the receiver in a send and the sender in a receive. As in int\-ra-com\-mun\-i\-cat\-ion, the target process is specified using a \mpiarg{(communicator, rank)} pair. Unlike int\-ra-com\-mun\-i\-cat\-ion, the rank is relative to a second, remote group. %MPI-1.2 \CHANGE{MPI-2, p.\ 25}{ All inter-communicator constructors are blocking and require that the local and remote groups be disjoint in order to avoid deadlock. } \INTO{ All inter-communicator constructors are blocking and require that the local and remote groups be disjoint. } %MPI-1.2 \ADD{MPI-2, p.\ 25}{}\ADD{Advice to users:}{} \begin{users} The groups must be disjoint for several reasons. Primarily, this is the intent of the intercommunicators --- to provide a communicator for communication between disjoint groups. This is reflected in the definition of \mpifunc{MPI\_INTERCOMM\_MERGE}, which allows the user to control the ranking of the processes in the created intracommunicator; this ranking makes little sense if the groups are not disjoint. In addition, the natural extension of collective operations to intercommunicators makes the most sense when the groups are disjoint. \end{users} Here is a summary of the properties of int\-er-com\-mun\-i\-cat\-ion and inter-communicators: \begin{itemize} \item The syntax of point-to-point \mpiiidotiMergeFromREVIEWbegin{6.e+9.g}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 and collective \mpiiidotiMergeFromREVIEWendI{6.e+9.g}% MPI-2.1 End of review based correction communication is the same for both inter- and int\-ra-com\-mun\-i\-cat\-ion. The same communicator can be used both for send and for receive operations. \item A target process is addressed by its rank in the remote group, both for sends and for receives. \item Communications using an inter-communicator are guaranteed not to conflict with any communications that use a different communicator. \mpiiidotiMergeFromREVIEWbegin{6.e+9.g}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % \item % An inter-communicator cannot be used for collective communication. \mpiiidotiMergeFromREVIEWendI{6.e+9.g}% MPI-2.1 End of review based correction \item A communicator will provide either intra- or int\-er-com\-mun\-i\-cat\-ion, never both. \end{itemize} \noindent The routine \func{MPI\_COMM\_TEST\_INTER} may be used to determine if a communicator is an inter- or intra-communicator. Inter-communicators can be used as arguments to some of the other communicator access routines. Inter-communicators cannot be used as input to some of the constructor routines for intra-communicators (for instance, \mpifunc{MPI\_COMM\_CREATE}). \begin{implementors} For the purpose of point-to-point communication, communicators can be represented in each process by a tuple consisting of: \begin{description} \item[group] \item[send\_context] \item[receive\_context] \item[source] \end{description} \noindent For inter-communicators, {\bf group} describes the remote group, and {\bf source} is the rank of the process in the local group. For intra-communicators, {\bf group} is the communicator group (remote=local), {\bf source} is the rank of the process in this group, and {\bf send context} and {\bf receive context} are identical. A group is represented by a rank-to-absolute-address translation table. The inter-communicator cannot be discussed sensibly without considering processes in both the local and remote groups. Imagine a process {\bf P} in group $\cal P$, which has an inter-communicator {\bf ${\bf C}_{\cal P}$}, and a process {\bf Q} in group $\cal Q$, which has an inter-communicator {\bf ${\bf C}_{\cal Q}$}. Then \begin{itemize} \item {\bf ${\bf C}_{\cal P}$.group} describes the group $\cal Q$ and {\bf ${\bf C}_{\cal Q}$.group} describes the group $\cal P$. \item {\bf ${\bf C}_{\cal P}$.send\_context~=~${\rm C}_{\cal Q}$.receive\_context} and the context is unique in $\cal Q$; \\ {\bf ${\bf C}_{\cal P}$.receive\_context~=~ ${\bf C}_{\cal Q}$.send\_context} and this context is unique in $\cal P$. \item {\bf ${\bf C}_{\cal P}$.source} is rank of {\bf P} in $\cal P$ and {\bf ${\bf C}_{\cal Q}$.source} is rank of {\bf Q} in $\cal Q$. \end{itemize} Assume that {\bf P} sends a message to {\bf Q} using the inter-communicator. Then {\bf P} uses the {\bf group} table to find the absolute address of {\bf Q}; {\bf source} and {\bf send\_context} are appended to the message. Assume that {\bf Q} posts a receive with an explicit source argument using the inter-communicator. Then {\bf Q} matches {\bf receive\_context} to the message context and source argument to the message source. The same algorithm is appropriate for intra-communicators as well. In order to support inter-communicator accessors and constructors, it is necessary to supplement this model with additional structures, that store information about the local communication group, and additional safe contexts. \end{implementors} \subsection{Inter-communicator Accessors} \label{subsec:context-intercomacc} \begin{funcdef}{MPI\_COMM\_TEST\_INTER(comm, flag)} \funcarg{\IN}{ comm}{ communicator (handle)} \funcarg{\OUT}{flag}{ (logical)} \end{funcdef} \mpibind{MPI\_Comm\_test\_inter(MPI\_Comm~comm, int~*flag)} \mpifbind{MPI\_COMM\_TEST\_INTER(COMM, FLAG, IERROR)\fargs INTEGER COMM, IERROR\\ LOGICAL FLAG} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Comm::Is\_inter() const}{bool} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent This local routine allows the calling process to determine if a communicator is an inter-communicator or an intra-communicator. It returns \const{true} if it is an inter-communicator, otherwise \const{false}. When an inter-communicator is used as an input argument to the communicator accessors described above under intra-communication, the following table describes behavior. \vspace*{.1in} \begin{table}[h] \begin{center} \begin{tabular}{|l|p{3.0in}|} % \mpiiidotiMergeFromREVIEWbegin{10.e}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % \hline % \multicolumn{2}{|c|}{\func{ MPI\_COMM\_*} Function Behavior} \\ % \multicolumn{2}{|c|}{(in Inter-Communication Mode)}\\ % \hline % \mpiiidotiMergeFromREVIEWendII{10.e}% MPI-2.1 End of review based correction \hline \func{MPI\_COMM\_SIZE} & returns the size of the local group. \\ \func{MPI\_COMM\_GROUP} & returns the local group. \\ \func{MPI\_COMM\_RANK} & returns the rank in the local group \\ \hline \end{tabular} \end{center} \caption{% \mpiiidotiMergeFromREVIEWbegin{10.e}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 \mpiskipfunc{MPI\_COMM\_*} Function Behavior (in Inter-Communication Mode) \mpiiidotiMergeFromREVIEWendII{10.e}% MPI-2.1 End of review based correction } \label{table:context:inter:size} \end{table} \noindent Furthermore, the operation \func{MPI\_COMM\_COMPARE} is valid for inter-communicators. Both communicators must be either intra- or inter-communicators, or else \const{MPI\_UNEQUAL} results. Both corresponding local and remote groups must compare correctly to get the results \const{MPI\_CONGRUENT} and \const{MPI\_SIMILAR}. In particular, it is possible for \const{MPI\_SIMILAR} to result because either the local or remote groups were similar but not identical. The following accessors provide consistent access to the remote group of an inter-communicator: %%%%%%%%%%%%% The following are all local operations. \begin{funcdef}{MPI\_COMM\_REMOTE\_SIZE(comm, size)} \funcarg{\IN}{comm}{ inter-communicator (handle)} \funcarg{\OUT}{size}{ number of processes in the remote group of \mpiarg{ comm} (integer)} \end{funcdef} \mpibind{MPI\_Comm\_remote\_size(MPI\_Comm~comm, int~*size)} \mpifbind{MPI\_COMM\_REMOTE\_SIZE(COMM, SIZE, IERROR)\fargs INTEGER COMM, SIZE, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Intercomm::Get\_remote\_size() const}{int} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{funcdef}{MPI\_COMM\_REMOTE\_GROUP(comm, group)} \funcarg{\IN}{comm}{ inter-communicator (handle)} \funcarg{\OUT}{group}{ remote group corresponding to \mpiarg{comm} (handle)} \end{funcdef} \mpibind{MPI\_Comm\_remote\_group(MPI\_Comm~comm, MPI\_Group~*group)} \mpifbind{MPI\_COMM\_REMOTE\_GROUP(COMM, GROUP, IERROR)\fargs INTEGER COMM, GROUP, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Intercomm::Get\_remote\_group() const}{MPI::Group} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \begin{rationale} Symmetric access to both the local and remote groups of an inter-communicator is important, so this function, as well as \func{MPI\_COMM\_REMOTE\_SIZE} have been provided. \end{rationale} %%%%%%%%%%%%% \subsection{Inter-communicator Operations} % Passed: 12-3-9 \label{subsec:context-intercomm} This section introduces four blocking inter-communicator operations. \mpifunc{MPI\_INTERCOMM\_CREATE} is used to bind %mansplit two intra-communicators into an in\-ter-com\-mun\-i\-ca\-tor; the function \mpifunc{MPI\_INTERCOMM\_MERGE} creates an intra-communicator by merging the local and remote groups of an inter-communicator. The functions \linebreak[3]\mpifunc{MPI\_COMM\_DUP} and \linebreak[3]\mpifunc{MPI\_COMM\_FREE}, introduced previously, duplicate and free an inter-communicator, respectively. Overlap of local and remote groups that are bound into an inter-communicator is prohibited. If there is overlap, then the program is erroneous and is likely to deadlock. (If a process is multithreaded, and \MPI/ calls block only a thread, rather than a process, then ``dual membership'' can be supported. It is then the user's responsibility to make sure that calls on behalf of the two ``roles'' of a process are executed by two independent threads.) The function \mpifunc{MPI\_INTERCOMM\_CREATE} can be used to create an inter-communicator from two existing intra-communicators, in the following situation: At least one selected member from each group (the ``group leader'') has the ability to communicate with the selected member from the other group; that is, a ``peer'' communicator exists to which both leaders belong, and each leader knows the rank of the other leader in %MPI-1.2 and MPI-1.2-review-2008.03.13 this peer communicator\DELETE{MPI-2, p.\ 25}{ (the two leaders could be the same process)}. Furthermore, members of each group know the rank of their leader. Construction of an inter-communicator from two intra-communicators requires separate collective operations in the local group and in the remote group, as well as a point-to-point communication between a process in the local group and a process in the remote group. In standard \MPI/ implementations (with static process allocation at initialization), the \mpifunc{MPI\_COMM\_WORLD} communicator (or preferably a dedicated duplicate thereof) can be this peer communicator. \mpiiidotiMergeFromREVIEWbegin{9.i}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % In dynamic \MPI/ % implementations, where, for example, a process may spawn new child % processes during an \MPI/ execution, the parent process may be the % ``bridge'' between the old communication universe and the new % communication world that includes the parent and its children. For applications that have used spawn or join, it may be necessary to first create an intracommunicator to be used as peer. \mpiiidotiMergeFromREVIEWendI{9.i}% MPI-2.1 End of review based correction The application topology functions described in chapter~\ref{chap:topol} do not apply to inter-communicators. Users that require this capability should utilize \func{MPI\_INTERCOMM\_MERGE} to build an intra-communicator, then apply the graph or cartesian topology capabilities to that intra-communicator, creating an appropriate topology-oriented intra-communicator. Alternatively, it may be reasonable to devise one's own application topology mechanisms for this case, without loss of generality. \snir \begin{funcdef2}{MPI\_INTERCOMM\_CREATE(local\_comm, local\_leader, peer\_comm, remote\_leader, tag,}{ newintercomm)} \funcarg{\IN}{local\_comm }{ local intra-communicator (handle)} \funcarg{\IN}{local\_leader}{ rank of local group leader in \mpiarg{local\_comm} (integer)} \funcarg{\IN}{peer\_comm}{ ``peer'' communicator; significant only at the \mpiarg{local\_leader} (handle)} \funcarg{\IN}{remote\_leader}{ rank of remote group leader in \mpiarg{ peer\_comm}; significant only at the \mpiarg{local\_leader} (integer)} \funcarg{\IN}{tag}{ ``safe'' tag (integer) } \funcarg{\OUT}{newintercomm }{ new inter-communicator (handle)} \end{funcdef2} \rins \mpibind{MPI\_Intercomm\_create(MPI\_Comm~local\_comm, int~local\_leader, MPI\_Comm~peer\_comm, int~remote\_leader, int~tag, MPI\_Comm~*newintercomm)} \mpifbind{MPI\_INTERCOMM\_CREATE(LOCAL\_COMM, LOCAL\_LEADER, PEER\_COMM, REMOTE\_LEADER, TAG, NEWINTERCOMM, IERROR)\fargs INTEGER LOCAL\_COMM, LOCAL\_LEADER, PEER\_COMM, REMOTE\_LEADER, TAG, NEWINTERCOMM, IERROR} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Intracomm::Create\_intercomm(int~local\_leader, const MPI::Comm\&~peer\_comm, int~remote\_leader, int~tag) const}{MPI::Intercomm} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent This call creates an inter-communicator. It is collective over the union of the local and remote groups. Processes should provide identical \mpiarg{local\_comm} and \mpiarg{local\_leader} arguments within each group. Wildcards are not permitted for \mpiarg{remote\_leader, local\_leader}, and \mpiarg{tag}. This call uses point-to-point communication with communicator \mpiarg{peer\_comm}, and with tag \mpiarg{tag} between the leaders. Thus, care must be taken that there be no pending communication on \mpiarg{peer\_comm} that could interfere with this communication. \begin{users} We recommend using a dedicated peer communicator, such as a duplicate of MPI\_COMM\_WORLD, to avoid trouble with peer communicators. \end{users} \begin{funcdef}{MPI\_INTERCOMM\_MERGE(intercomm, high, newintracomm)} \funcarg{\IN}{intercomm}{Inter-Communicator (handle) } \funcarg{\IN}{high}{(logical)} \funcarg{\OUT}{newintracomm}{ new intra-communicator (handle) } \end{funcdef} \mpibind{MPI\_Intercomm\_merge(MPI\_Comm~intercomm, int~high, MPI\_Comm~*newintracomm)} \mpifbind{MPI\_INTERCOMM\_MERGE(INTERCOMM, HIGH, INTRACOMM, IERROR)\fargs INTEGER INTERCOMM, INTRACOMM, IERROR \\ LOGICAL HIGH} \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - from appendix-c++.tex via cpp-mpi1-add-to-tex-source.ed \mpicppemptybind{MPI::Intercomm::Merge(bool~high) const}{MPI::Intracomm} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 \noindent This function creates an intra-communicator from the union of the two groups that are associated with \mpiarg{intercomm}. All processes should provide the same \mpiarg{high} value within each of the two groups. If processes in one group provided the value \mpiarg{high = false} and processes in the other group provided the value \mpiarg{high = true} then the union orders the ``low'' group before the ``high'' group. If all processes provided the same \mpiarg{high} argument then the order of the union is arbitrary. This call is blocking and collective within the union of the two groups. %MPI-1.2 \ADD{MPI-2, p.\ 26}{ The error handler on the new intercommunicator in each process is inherited from the communicator that contributes the local group. Note that this can result in different processes in the same communicator having different error handlers. } \begin{implementors} The implementation of \func{MPI\_INTERCOMM\_MERGE}, \linebreak \func{MPI\_COMM\_FREE} and \func{MPI\_COMM\_DUP} are similar to the implementation of \linebreak \func{MPI\_INTERCOMM\_CREATE}, except that contexts private to the input in\-ter-\-com\-mun\-i\-ca\-tor are used for communication between group leaders rather than contexts inside a bridge communicator. \end{implementors} \subsection{Inter-Communication Examples} \subsubsection{Example 1: Three-Group ``Pipeline"} \label{context-ex7} % \begin{figure} % \centerline{\hbox{ % \psfig{figure=figures/context-fig-1.ps,width=4.0in}}} % \caption{Three-group pipeline.} % \end{figure} \begin{figure} \center \includegraphics[width=4.0in]{figures/context-fig-1} \caption{Three-group pipeline.} \end{figure} %\begin{verbatim} % %+---------+ +---------+ +---------+ %| | | | | | %| Group 0 | <-----> | Group 1 | <-----> | Group 2 | %| | | | | | %+---------+ +---------+ +---------+ % %\end{verbatim} \noindent Groups 0 and 1 communicate. Groups 1 and 2 communicate. Therefore, group 0 requires one inter-communicator, group 1 requires two inter-communicators, and group 2 requires 1 inter-communicator. \begin{verbatim} main(int argc, char **argv) { MPI_Comm myComm; /* intra-communicator of local sub-group */ MPI_Comm myFirstComm; /* inter-communicator */ MPI_Comm mySecondComm; /* second inter-communicator (group 1 only) */ int membershipKey; int rank; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); /* User code must generate membershipKey in the range [0, 1, 2] */ membershipKey = rank % 3; /* Build intra-communicator for local sub-group */ MPI_Comm_split(MPI_COMM_WORLD, membershipKey, rank, &myComm); /* Build inter-communicators. Tags are hard-coded. */ if (membershipKey == 0) { /* Group 0 communicates with group 1. */ MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 1, 1, &myFirstComm); } else if (membershipKey == 1) { /* Group 1 communicates with groups 0 and 2. */ MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 0, 1, &myFirstComm); MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 2, 12, &mySecondComm); } else if (membershipKey == 2) { /* Group 2 communicates with group 1. */ MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 1, 12, &myFirstComm); } /* Do work ... */ switch(membershipKey) /* free communicators appropriately */ { case 1: MPI_Comm_free(&mySecondComm); case 0: case 2: MPI_Comm_free(&myFirstComm); break; } MPI_Finalize(); } \end{verbatim} \subsubsection{Example 2: Three-Group ``Ring"} \label{context-ex8} % \begin{figure} % \centerline{\hbox{ % \psfig{figure=figures/context-fig-2.ps,width=4.0in}}} % \caption{Three-group ring.} % \end{figure} \begin{figure} \center \includegraphics[width=4.0in]{figures/context-fig-2} \caption{Three-group ring.} \end{figure} %\begin{verbatim} %+-----------------------------------------------------------+ %| | %| +---------+ +---------+ +---------+ | %| | | | | | | | %+--> | Group 0 | <-----> | Group 1 | <-----> | Group 2 | <--+ % | | | | | | %+---------+ +---------+ +---------+ %\end{verbatim} Groups 0 and 1 communicate. Groups 1 and 2 communicate. Groups 0 and 2 communicate. Therefore, each requires two inter-communicators. \begin{verbatim} main(int argc, char **argv) { MPI_Comm myComm; /* intra-communicator of local sub-group */ MPI_Comm myFirstComm; /* inter-communicators */ MPI_Comm mySecondComm; MPI_Status status; int membershipKey; int rank; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ... /* User code must generate membershipKey in the range [0, 1, 2] */ membershipKey = rank % 3; /* Build intra-communicator for local sub-group */ MPI_Comm_split(MPI_COMM_WORLD, membershipKey, rank, &myComm); /* Build inter-communicators. Tags are hard-coded. */ if (membershipKey == 0) { /* Group 0 communicates with groups 1 and 2. */ MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 1, 1, &myFirstComm); MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 2, 2, &mySecondComm); } else if (membershipKey == 1) { /* Group 1 communicates with groups 0 and 2. */ MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 0, 1, &myFirstComm); MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 2, 12, &mySecondComm); } else if (membershipKey == 2) { /* Group 2 communicates with groups 0 and 1. */ MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 0, 2, &myFirstComm); MPI_Intercomm_create( myComm, 0, MPI_COMM_WORLD, 1, 12, &mySecondComm); } /* Do some work ... */ /* Then free communicators before terminating... */ MPI_Comm_free(&myFirstComm); MPI_Comm_free(&mySecondComm); MPI_Comm_free(&myComm); MPI_Finalize(); } \end{verbatim} \subsubsection{Example 3: Building Name Service for Intercommunication} \label{ex:comm-namesrvr} The following procedures exemplify the process by which a user could create name service for building intercommunicators via a rendezvous involving a server communicator, and a tag name selected by both groups. After all \MPI/ processes execute \func{MPI\_INIT}, every process calls the example function, \mpiskipfunc{Init\_server()}, defined below. Then, if the \mpiarg{new\_world} returned is NULL, the process getting NULL is required to implement a server function, in a reactive loop, \mpiskipfunc{Do\_server()}. Everyone else just does their prescribed computation, using \mpiarg{new\_world} as the new effective ``global" communicator. One designated process calls \mpiskipfunc{Undo\_Server()} to get rid of the server when it is not needed any longer. Features of this approach include: \begin{itemize} \item Support for multiple name servers \item Ability to scope the name servers to specific processes \item Ability to make such servers come and go as desired. \end{itemize} \begin{verbatim}#define INIT_SERVER_TAG_1 666 #define UNDO_SERVER_TAG_1 777 static int server_key_val; /* for attribute management for server_comm, copy callback: */ void handle_copy_fn(MPI_Comm *oldcomm, int *keyval, void *extra_state, void *attribute_val_in, void **attribute_val_out, int *flag) { /* copy the handle */ *attribute_val_out = attribute_val_in; *flag = 1; /* indicate that copy to happen */ } int Init_server(peer_comm, rank_of_server, server_comm, new_world) MPI_Comm peer_comm; int rank_of_server; MPI_Comm *server_comm; MPI_Comm *new_world; /* new effective world, sans server */ { MPI_Comm temp_comm, lone_comm; MPI_Group peer_group, temp_group; int rank_in_peer_comm, size, color, key = 0; int peer_leader, peer_leader_rank_in_temp_comm; MPI_Comm_rank(peer_comm, &rank_in_peer_comm); MPI_Comm_size(peer_comm, &size); if ((size < 2) || (0 > rank_of_server) || (rank_of_server >= size)) return (MPI_ERR_OTHER); /* create two communicators, by splitting peer_comm into the server process, and everyone else */ peer_leader = (rank_of_server + 1) % size; /* arbitrary choice */ if ((color = (rank_in_peer_comm == rank_of_server))) { MPI_Comm_split(peer_comm, color, key, &lone_comm); MPI_Intercomm_create(lone_comm, 0, peer_comm, peer_leader, INIT_SERVER_TAG_1, server_comm); MPI_Comm_free(&lone_comm); *new_world = MPI_COMM_NULL; } else { MPI_Comm_Split(peer_comm, color, key, &temp_comm); MPI_Comm_group(peer_comm, &peer_group); MPI_Comm_group(temp_comm, &temp_group); MPI_Group_translate_ranks(peer_group, 1, &peer_leader, temp_group, &peer_leader_rank_in_temp_comm); MPI_Intercomm_create(temp_comm, peer_leader_rank_in_temp_comm, peer_comm, rank_of_server, INIT_SERVER_TAG_1, server_comm); /* attach new_world communication attribute to server_comm: */ /* CRITICAL SECTION FOR MULTITHREADING */ if(server_keyval == MPI_KEYVAL_INVALID) { /* acquire the process-local name for the server keyval */ MPI_keyval_create(handle_copy_fn, NULL, &server_keyval, NULL); } *new_world = temp_comm; /* Cache handle of intra-communicator on inter-communicator: */ MPI_Attr_put(server_comm, server_keyval, (void *)(*new_world)); } return (MPI_SUCCESS); } \end{verbatim} The actual server process would commit to running the following code: \begin{verbatim} int Do_server(server_comm) MPI_Comm server_comm; { void init_queue(); int en_queue(), de_queue(); /* keep triplets of integers for later matching (fns not shown) */ MPI_Comm comm; MPI_Status status; int client_tag, client_source; int client_rank_in_new_world, pairs_rank_in_new_world; int buffer[10], count = 1; void *queue; init_queue(&queue); for (;;) { MPI_Recv(buffer, count, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, server_comm, &status); /* accept from any client */ /* determine client: */ client_tag = status.MPI_TAG; client_source = status.MPI_SOURCE; client_rank_in_new_world = buffer[0]; if (client_tag == UNDO_SERVER_TAG_1) /* client that terminates server */ { while (de_queue(queue, MPI_ANY_TAG, &pairs_rank_in_new_world, &pairs_rank_in_server)) ; MPI_Intercomm_free(&server_comm); break; } if (de_queue(queue, client_tag, &pairs_rank_in_new_world, &pairs_rank_in_server)) { /* matched pair with same tag, tell them about each other! */ buffer[0] = pairs_rank_in_new_world; MPI_Send(buffer, 1, MPI_INT, client_src, client_tag, server_comm); buffer[0] = client_rank_in_new_world; MPI_Send(buffer, 1, MPI_INT, pairs_rank_in_server, client_tag, server_comm); } else en_queue(queue, client_tag, client_source, client_rank_in_new_world); } } \end{verbatim} A particular process would be responsible for ending the server when it is no longer needed. Its call to \mpiskipfunc{Undo\_server} would terminate server function. \begin{verbatim} int Undo_server(server_comm) /* example client that ends server */ MPI_Comm *server_comm; { int buffer = 0; MPI_Send(&buffer, 1, MPI_INT, 0, UNDO_SERVER_TAG_1, *server_comm); MPI_Intercomm_free(server_comm); } \end{verbatim} The following is a blocking name-service for inter-communication, with same semantic restrictions as \func{MPI\_Intercomm\_create}, but simplified syntax. It uses the functionality just defined to create the name service. \begin{verbatim} int Intercomm_name_create(local_comm, server_comm, tag, comm) MPI_Comm local_comm, server_comm; int tag; MPI_Comm *comm; { int error; int found; /* attribute acquisition mgmt for new_world */ /* comm in server_comm */ void *val; MPI_Comm new_world; int buffer[10], rank; int local_leader = 0; MPI_Attr_get(server_comm, server_keyval, &val, &found); new_world = (MPI_Comm)val; /* retrieve cached handle */ MPI_Comm_rank(server_comm, &rank); /* rank in local group */ if (rank == local_leader) { buffer[0] = rank; MPI_Send(&buffer, 1, MPI_INT, 0, tag, server_comm); MPI_Recv(&buffer, 1, MPI_INT, 0, tag, server_comm); } error = MPI_Intercomm_create(local_comm, local_leader, new_world, buffer[0], tag, comm); return(error); } \end{verbatim} \section{Caching} % Passed: 17-0-3 \label{sec:caching} \MPI/ provides a ``caching'' facility that allows an application to attach arbitrary pieces of information, called {\bf attributes}, to \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % communicators. More precisely, the caching three kinds of MPI objects, communicators, windows and datatypes. More precisely, the caching \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification facility allows a portable library to do the following: \begin{itemize} \item pass information between calls by associating it with an \MPI/ intra- or in\-ter-\-com\-mun\-i\-ca\-tor, \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification window or datatype, \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification \item quickly retrieve that information, and \item be guaranteed that out-of-date information is never retrieved, even if \mpiiidotiMergeNEWforSINGLEbegin% MPI-2.1 round-two - begin of modification % the communicator is freed and its handle subsequently reused by \MPI/. the object is freed and its handle subsequently reused by \MPI/. \mpiiidotiMergeNEWforSINGLEendI% MPI-2.1 round-two - end of modification \end{itemize} The caching capabilities, in some form, are required by built-in \MPI/ routines such as collective communication and application topology. Defining an interface to these capabilities as part of the \MPI/ standard is valuable because it permits routines like collective communication and application topologies to be implemented as portable code, and also because it makes \MPI/ more extensible by allowing user-written routines to use standard \MPI/ calling sequences. \begin{users} The communicator \const{MPI\_COMM\_SELF} is a suitable choice for posting process-local attributes, via this attributing-caching mechanism. \end{users} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8, p.198 l.42 - p.199 l.8 , File 2.0/ei-2.tex, lines 2212-2236 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{New Attribute Caching Functions} \label{sec:ei-attr} \label{sec:ei-handlecache} Caching on communicators has been a very useful feature. In \mpiii/ it is expanded to include caching on windows and datatypes. \begin{rationale} In one extreme you can allow caching on all opaque handles. The other extreme is to only allow it on communicators. Caching has a cost associated with it and should only be allowed when it is clearly needed and the increased cost is modest. This is the reason that windows and datatypes were added but not other handles. \end{rationale} One difficulty in \mpii/ is the potential for size differences between Fortran integers and C pointers. To overcome this problem with attribute caching on communicators, new functions are also given for this case. The new functions to cache on datatypes and windows also address this issue. For a general discussion of the address size problem, see Section~\ref{sec:misc-addresses}. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-2.0, Sect. 8.8, p.199 l.8 - p.199 l.11, File 2.0/ei-2.tex, lines 2237-2244 (clarification is duplicated, therefore this hint is obsolete) \mpiiidotiMergeFromBALLOTbegin{2}{9}% MPI-2.1 Ballots 1-4 \begin{implementors} High quality implementations should raise an error when a keyval \mpifuncindex{MPI\_TYPE\_CREATE\_KEYVAL} \mpifuncindex{MPI\_COMM\_CREATE\_KEYVAL} \mpifuncindex{MPI\_WIN\_CREATE\_KEYVAL} that was created by a call to \mpiskipfunc{MPI\_XXX\_CREATE\_KEYVAL} is used with an object of the wrong type with a call to \mpifunc{MPI\_YYY\_GET\_ATTR}, \mpifunc{MPI\_YYY\_SET\_ATTR}, \mpifunc{MPI\_YYY\_DELETE\_ATTR}, or \mpifunc{MPI\_YYY\_FREE\_KEYVAL}. To do so, it is necessary to maintain, with each keyval, information on the type of the associated user function. \end{implementors} \mpiiidotiMergeFromBALLOTendII{2}{9}% MPI-2.1 Ballots 1-4 \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.167 l.35 - p.168 l.28, File 1.3/context.tex, lines 2066-2129 \subsection{Functionality} \label{subsec:context-cachefunc} Attributes are attached to communicators. Attributes are local to the process and specific to the communicator to which they are attached. Attributes are not propagated by \MPI/ from one communicator to another except when the communicator is duplicated using \func{MPI\_COMM\_DUP} (and even then the application must give specific permission through callback functions for the attribute to be copied). \snir \begin{users} Attributes in C are of type \const{void *}. Typically, such an attribute will be a pointer to a structure that contains further information, or a handle to an \MPI/ object. In Fortran, attributes are of type \const{INTEGER}. Such attribute can be a handle to an \MPI/ object, or just an integer-valued attribute. \end{users} \rins \begin{implementors} Attributes are scalar values, equal in size to, or larger than a C-language pointer. Attributes can always hold an \MPI/ handle. \end{implementors} The caching interface defined here represents that attributes be stored by \MPI/ opaquely within a communicator. Accessor functions include the following: \begin{itemize} \item obtain a key value (used to identify an attribute); the user specifies ``callback'' functions by which \MPI/ informs the application when the communicator is destroyed or copied. \item store and retrieve the value of an attribute; \end{itemize} \begin{implementors} Caching and callback functions are only called synchronously, in response to explicit application requests. This avoid problems that result from repeated crossings between user and system space. (This synchronous calling rule is a general property of \MPI/.) The choice of key values is under control of \MPI/. This allows \MPI/ to optimize its implementation of attribute sets. It also avoids conflict between independent modules caching information on the same communicators. A much smaller interface, consisting of just a callback facility, would allow the entire caching facility to be implemented by portable code. However, with the minimal callback interface, some form of table searching is implied by the need to handle arbitrary communicators. In contrast, the more complete interface defined here permits rapid access to attributes through the use of pointers in communicators (to find the attribute table) and cleverly chosen key values (to retrieve individual attributes). In light of the efficiency ``hit'' inherent in the minimal interface, the more complete interface defined here is seen to be superior. \end{implementors} \noindent \MPI/ provides the following services related to caching. They are all process local. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.199 l.13 - p.199 l.17, File 2.0/ei-2.tex, lines 2245-2251 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Communicators} The new functions that are replacements for the \mpii/ functions for caching on communicators are: \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.199 l.18 - p.199 l.39, File 2.0/ei-2.tex, lines 2252-2268 \begchangefini \begin{funcdef}{MPI\_COMM\_CREATE\_KEYVAL(comm\_copy\_attr\_fn, comm\_delete\_attr\_fn, comm\_keyval, \gb extra\_state)} \funcarg{\IN}{comm\_copy\_attr\_fn}{copy callback function for \mpiarg{comm\_keyval} (function)} \funcarg{\IN}{comm\_delete\_attr\_fn}{delete callback function for \mpiarg{comm\_keyval} (function)} \funcarg{\OUT}{comm\_keyval}{key value for future access (integer)} \funcarg{\IN}{extra\_state}{extra state for callback functions} \end{funcdef} \mpibind{MPI\_Comm\_create\_keyval(MPI\_Comm\_copy\_attr\_function~*comm\_copy\_attr\_fn, MPI\_Comm\_delete\_attr\_function~*comm\_delete\_attr\_fn, int~*comm\_keyval, void~*extra\_state)} \mpifbind{MPI\_COMM\_CREATE\_KEYVAL(COMM\_COPY\_ATTR\_FN, COMM\_DELETE\_ATTR\_FN, COMM\_KEYVAL, EXTRA\_STATE, IERROR)\fargs EXTERNAL COMM\_COPY\_ATTR\_FN, COMM\_DELETE\_ATTR\_FN\\INTEGER COMM\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE} \begchangefinii \mpicppemptybind{MPI::Comm::Create\_keyval(MPI::Comm::Copy\_attr\_function* comm\_copy\_attr\_fn, MPI::Comm::Delete\_attr\_function*~comm\_delete\_attr\_fn, void*~extra\_state)}{static int} \endchangefinii \endchangefini \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.168 l.43 - p.168 l.46, File 1.3/context.tex, lines 2141-2145 Generates a new attribute key. Keys are locally unique in a process, and opaque to user, though they are explicitly stored in integers. Once allocated, the key value can be used to associate attributes and access them on any locally defined communicator. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.199 l.40 - p.199 l.43, File 2.0/ei-2.tex, lines 2269-2278 This function replaces \mpifunc{MPI\_KEYVAL\_CREATE}, \begchangefinii whose use is deprecated. \endchangefinii The C binding is identical. The Fortran binding differs in that \mpiarg{extra\_state} is an address-sized integer. Also, the copy and delete callback functions have Fortran bindings that are consistent with address-sized attributes. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.200 l. 7 - p.200 l.36, File 2.0/ei-2.tex, lines 2308-2345 \begchangefini The C callback functions are: \begchangefinii \mpitypedefbind{MPI\_Comm\_copy\_attr\_function(MPI\_Comm~oldcomm, int~comm\_keyval, void~*extra\_state, void~*attribute\_val\_in, void~*attribute\_val\_out, int~*flag)} and \mpitypedefbind{MPI\_Comm\_delete\_attr\_function(MPI\_Comm comm, int comm\_keyval, void *attribute\_val, void *extra\_state)} \endchangefinii \noindent which are the same as the \mpiidoti/ calls but with a new name. \begchangefinii The old names are deprecated. \endchangefinii The Fortran callback functions are: \mpifsubbind{COMM\_COPY\_ATTR\_FN(OLDCOMM, COMM\_KEYVAL, EXTRA\_STATE, ATTRIBUTE\_VAL\_IN, ATTRIBUTE\_VAL\_OUT, FLAG, IERROR)\fargs INTEGER OLDCOMM, COMM\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE, ATTRIBUTE\_VAL\_IN,\\ \ \ \ \ ATTRIBUTE\_VAL\_OUT\\LOGICAL FLAG} and \mpifsubbind{COMM\_DELETE\_ATTR\_FN(COMM, COMM\_KEYVAL, ATTRIBUTE\_VAL, EXTRA\_STATE, IERROR)\fargs INTEGER COMM, COMM\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL, EXTRA\_STATE} The C++ callbacks are: \begchangefinii \mpicpptypedefemptybind{MPI::Comm::Copy\_attr\_function(const~MPI::Comm\&~oldcomm, int~comm\_keyval, void*~extra\_state, void*~attribute\_val\_in, void*~attribute\_val\_out, bool\&~flag)}{int} and \mpicpptypedefemptybind{MPI::Comm::Delete\_attr\_function(MPI::Comm\&~comm, int~comm\_keyval, void*~attribute\_val, void*~extra\_state)}{int} \endchangefini \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.168 l.47 - p.168 l.48, File 1.3/context.tex, lines 2146-2149 The \func{comm\_copy\_attr\_fn} function is invoked when a communicator is duplicated by \mpifunc{MPI\_COMM\_DUP}. \func{comm\_copy\_attr\_fn} should be of type \const{MPI\_Comm\_copy\_attr\_function}. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.169 l.11 - p.169 l.16, File 1.3/context.tex, lines 2165-2178 The copy callback function is invoked for each key value in \mpiarg{oldcomm} in arbitrary order. Each call to the copy callback is made with a key value and its corresponding attribute. If it returns \const{flag = 0}, then the attribute is deleted in the duplicated communicator. Otherwise (\const{flag = 1}), \snir the new attribute value is set to the value returned in \mpiarg{attribute\_val\_out}. \rins The function returns \const{MPI\_SUCCESS} on success and an error code on failure (in which case \func{MPI\_COMM\_DUP} will fail). \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.199 l.44 - p.200 l. 2, File 2.0/ei-2.tex, lines 2279-2296 \begchangefinii The argument \mpiarg{comm\_copy\_attr\_fn} may be specified as \mpifuncmainindex{MPI\_COMM\_NULL\_COPY\_FN} \mpiskipfunc{MPI\_COMM\_NULL\_COPY\_FN} or \mpifuncmainindex{MPI\_COMM\_DUP\_FN} \mpiskipfunc{MPI\_COMM\_DUP\_FN} from either C, C++, or Fortran. \mpifunc{MPI\_COMM\_NULL\_COPY\_FN} is a function that does nothing other than returning \mpiarg{flag = 0} and \consti{MPI\_SUCCESS}. \mpifunc{MPI\_COMM\_DUP\_FN} is a simple-minded copy function that sets \mpiarg{flag = 1}, returns the value of \mpiarg{attribute\_val\_in} in \mpiarg{attribute\_val\_out}, and returns \consti{MPI\_SUCCESS}. These replace the \mpii/ predefined callbacks \mpifunc{MPI\_NULL\_COPY\_FN} and \mpifunc{MPI\_DUP\_FN}, whose use is deprecated. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.169 l.21 - p.169 l.35, File 1.3/context.tex, lines 2197-2222 \begin{users} Even though both formal arguments \mpiarg{attribute\_val\_in} and \mpiarg{attribute\_val\_out} are of type \const{void *}, their usage differs. The C copy function is passed by \MPI/ in \mpiarg{attribute\_val\_in} the {\em value} of the attribute, and in \mpiarg{attribute\_val\_out} the {\em address} of the attribute, so as to allow the function to return the (new) attribute value. The use of type \const{void *} for both is to avoid messy type casts. \rins A valid copy function is one that completely duplicates the information by making a full duplicate copy of the data structures implied by an attribute; another might just make another reference to that data structure, while using a reference-count mechanism. Other types of attributes might not copy at all (they might be specific to \mpiarg{oldcomm} only). \end{users} \snir \begin{implementors} A C interface should be assumed for copy and delete functions associated with key values created in C; a Fortran calling interface should be assumed for key values created in Fortran. \end{implementors} \rins \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.169 l.36 - p.169 l.40, File 1.3/context.tex, lines 2223-2228 Analogous to \func{comm\_copy\_attr\_fn} is a callback deletion function, defined as follows. The \func{comm\_delete\_attr\_fn} function is invoked when a communicator is deleted by \mpifunc{MPI\_COMM\_FREE} or when a call is made explicitly to \mpifunc{MPI\_ATTR\_DELETE}. \func{comm\_delete\_attr\_fn} should be of type \const{MPI\_Comm\_delete\_attr\_function}. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.170 l. 1 - p.170 l. 3, File 1.3/context.tex, lines 2240-2250 This function is called by \mpifunc{MPI\_COMM\_FREE}, \mpifunc{MPI\_COMM\_DELETE\_ATTR}, \snir and \mpifunc{MPI\_COMM\_SET\_ATTR} \rins to do whatever is needed to remove an attribute. \snir The function returns \const{MPI\_SUCCESS} on success and an error code on failure (in which case \func{MPI\_COMM\_FREE} will fail). \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.200 l. 3 - p.200 l. 6, File 2.0/ei-2.tex, lines 2297-2307 \begchangefinii The argument \mpiarg{comm\_delete\_attr\_fn} may be specified as \endchangefinii \mpifuncmainindex{MPI\_COMM\_NULL\_DELETE\_FN} \mpiskipfunc{MPI\_COMM\_NULL\_DELETE\_FN} from either C, C++, or Fortran. \mpiskipfunc{MPI\_COMM\_NULL\_DELETE\_FN} is a function that does nothing, other than returning \consti{MPI\_SUCCESS}. \mpifunc{MPI\_COMM\_NULL\_DELETE\_FN} replaces \mpifunc{MPI\_NULL\_DELETE\_FN}, whose use is deprecated. \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.170 l. 6 - p.170 l. 7, File 1.3/context.tex, lines 2257-2268 %MPI-1.2 \ADD{MPI-2, p.\ 26}{ If an attribute copy function or attribute delete function returns other than \const{MPI\_SUCCESS}, then the call that caused it to be invoked (for example, \mpifunc{MPI\_COMM\_FREE}), is erroneous. } The special key value \const{MPI\_KEYVAL\_INVALID} is never returned by \mpifunc{MPI\_KEYVAL\_CREATE}. Therefore, it can be used for static initialization of key values. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.200 l.38 - p.200 l.48, File 2.0/ei-2.tex, lines 2346-2355 \begin{funcdef}{MPI\_COMM\_FREE\_KEYVAL(comm\_keyval)} \funcarg{\INOUT}{comm\_keyval}{key value (integer)} \end{funcdef} \mpibind{MPI\_Comm\_free\_keyval(int *comm\_keyval)} \mpifbind{MPI\_COMM\_FREE\_KEYVAL(COMM\_KEYVAL, IERROR)\fargs INTEGER COMM\_KEYVAL, IERROR} \mpicppemptybind{MPI::Comm::Free\_keyval(int\& comm\_keyval)}{static void} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.170 l.17 - p.170 l.23, File 1.3/context.tex, lines 2277-2295 Frees an extant attribute key. This function sets the value of \mpiarg{keyval} to % \linebreak \const{MPI\_KEYVAL\_INVALID}. Note that it is not erroneous to free an attribute key that is in use, because the actual free does not transpire until after all references (in other communicators on the process) to the key have been freed. These references need to be explictly freed by the program, either via calls to \mpifunc{MPI\_COMM\_DELETE\_ATTR} that free one attribute instance, or by calls to \mpifunc{MPI\_COMM\_FREE} that free all attribute instances associated with the freed communicator. \snir %\begin{implementors}The function \mpifunc{MPI\_NULL\_FN} need not be %aliased to {\tt (void (*))0} in C, though this is fine. %It could be a legitimately callable function that profiles and so on. %For FORTRAN, it is most convenient to have \mpifunc{MPI\_NULL\_FN} %be a legitimate do-nothing function call.\end{implementors} \rins \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.201 l. 1 - p.201 l. 2, File 2.0/ei-2.tex, lines 2356-2366 % % WDG notes: % Why is this here? Why isn't it like MPI_ERRHANDLER_FREE? I don't believe % the rationale. % This call is identical to the \mpii/ call \mpifunc{MPI\_KEYVAL\_FREE} but is needed to match the new communicator-specific creation function. \begchangefinii The use of \mpifunc{MPI\_KEYVAL\_FREE} is deprecated. \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.201 l.4 - p.201 l.17, File 2.0/ei-2.tex, lines 2367-2380 \begchangefini \begin{funcdef}{MPI\_COMM\_SET\_ATTR(comm, comm\_keyval, attribute\_val)} \funcarg{\INOUT}{comm}{communicator from which attribute will be attached (handle)} \funcarg{\IN}{comm\_keyval}{key value (integer)} \funcarg{\IN}{attribute\_val}{attribute value} \end{funcdef} \mpibind{MPI\_Comm\_set\_attr(MPI\_Comm comm, int comm\_keyval, void *attribute\_val)} \mpifbind{MPI\_COMM\_SET\_ATTR(COMM, COMM\_KEYVAL, ATTRIBUTE\_VAL, IERROR)\fargs INTEGER COMM, COMM\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL} \mpicppemptybind{MPI::Comm::Set\_attr(int comm\_keyval, const void* attribute\_val) const}{void} \endchangefini \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.170 l.38 - p.170 l.44, File 1.3/context.tex, lines 2306-2319 This function stores the stipulated attribute value \mpiarg{attribute\_val} for subsequent retrieval by \func{MPI\_COMM\_GET\_ATTR}. If the value is already present, then the outcome is as if \mpifunc{MPI\_COMM\_DELETE\_ATTR} %mansplit was first called to delete the previous value (and the callback function \mpifunc{comm\_delete\_attr\_fn} was executed), and a new value was next stored. The call is erroneous if there is no key with value \mpiarg{keyval}; in particular \const{MPI\_KEYVAL\_INVALID} is an erroneous key value. \snir The call will fail if the \mpifunc{comm\_delete\_attr\_fn} function returned an error code other than \const{MPI\_SUCCESS}. \rins \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.201 l.18 - p.201 l.20, File 2.0/ei-2.tex, lines 2381-2388 This function replaces \mpifunc{MPI\_ATTR\_PUT}, \begchangefinii whose use is deprecated. \endchangefinii The C binding is identical. The Fortran binding differs in that \mpiarg{attribute\_val} is an address-sized integer. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.201 l.22 - p.201 l.38, File 2.0/ei-2.tex, lines 2389-2403 \begin{funcdef}{MPI\_COMM\_GET\_ATTR(comm, comm\_keyval, attribute\_val, flag)} \funcarg{\IN}{comm}{communicator to which the attribute is attached (handle)} \funcarg{\IN}{comm\_keyval}{key value (integer)} \funcarg{\OUT}{attribute\_val}{attribute value, unless \mpiarg{flag = false}} \funcarg{\OUT}{flag}{\consti{false} if no attribute is associated with the key (logical)} \end{funcdef} \mpibind{MPI\_Comm\_get\_attr(MPI\_Comm comm, int comm\_keyval, void *attribute\_val, int *flag)} \mpifbind{MPI\_COMM\_GET\_ATTR(COMM, COMM\_KEYVAL, ATTRIBUTE\_VAL, FLAG, IERROR)\fargs INTEGER COMM, COMM\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL\\LOGICAL FLAG} \mpicppemptybind{MPI::Comm::Get\_attr(int comm\_keyval, void* attribute\_val) const}{bool} \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.171 l.14 - p.171 l.28, File 1.3/context.tex, lines 2333-2360 Retrieves attribute value by key. The call is erroneous if there is no key with value \mpiarg{keyval}. On the other hand, the call is correct if the key value exists, but no attribute is attached on {\tt comm} for that key; in such case, the call returns {\tt flag = false}. In particular \const{MPI\_KEYVAL\_INVALID} is an erroneous key value. \snir \begin{users} The call to \mpifunc{MPI\_Comm\_set\_attr} passes in \mpiarg{attribute\_val} the {\em value} of the attribute; the call to \mpifunc{MPI\_Comm\_get\_attr} passes in \mpiarg{attribute\_val} the {\em address} of the the location where the attribute value is to be returned. Thus, if the attribute value itself is a pointer of type %MPI-1.2-review-2008.03.13 \const{void*}, the\DELETE{MPI-1.2-review-Rainer-2008.03.13}{ the} actual \mpiarg{attribute\_val} parameter to \mpifunc{MPI\_Comm\_set\_attr} will be of type \const{void*} and the actual \mpiarg{attribute\_val} parameter to \mpifunc{MPI\_Comm\_set\_attr} will be of type \const{void**}. \end{users} \begin{rationale} The use of a formal parameter \mpiarg{attribute\_val} or type \const{void*} (rather than \const{void**}) avoids the messy type casting that would be needed if the attribute value is declared with a type other than \const{void*}. \end{rationale} \rins \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.201 l.39 - p.201 l.41, File 2.0/ei-2.tex, lines 2404-2411 This function replaces \mpifunc{MPI\_ATTR\_GET}, \begchangefinii whose use is deprecated. \endchangefinii The C binding is identical. The Fortran binding differs in that \mpiarg{attribute\_val} is an address-sized integer. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.202 l.1 - p.202 l.11, File 2.0/ei-2.tex, lines 2412-2426 \begin{funcdef}{MPI\_COMM\_DELETE\_ATTR(comm, comm\_keyval)} \funcarg{\INOUT}{comm}{communicator from which the attribute is deleted (handle)} \funcarg{\IN}{comm\_keyval}{key value (integer)} \end{funcdef} \mpibind{MPI\_Comm\_delete\_attr(MPI\_Comm comm, int comm\_keyval)} \begchangefinii \mpifbind{MPI\_COMM\_DELETE\_ATTR(COMM, COMM\_KEYVAL, IERROR)\fargs INTEGER COMM, COMM\_KEYVAL, IERROR} \endchangefinii \begchangefinii \mpicppemptybind{MPI::Comm::Delete\_attr(int comm\_keyval)}{void} \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.171 l.40 - p.171 l.47, File 1.3/context.tex, lines 2370-2385 Delete attribute from cache by key. This function invokes the attribute delete function \mpiarg{comm\_delete\_attr\_fn} specified when the \mpiarg{keyval} was created. \snir The call will fail if the \mpiarg{comm\_delete\_attr\_fn} function returns an error code other than \const{MPI\_SUCCESS}. \rins Whenever a communicator is replicated using the function \mpifunc{MPI\_COMM\_DUP}, all call-back copy functions for attributes that are currently set are invoked (in arbitrary order). Whenever a communicator is deleted using the function \mpifunc{MPI\_COMM\_FREE} all callback delete functions for attributes that are currently set are invoked. %\end{users} \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.1, p.202 l.12 - p.202 l.14, File 2.0/ei-2.tex, lines 2427-2432 This function is the same as \mpifunc{MPI\_ATTR\_DELETE} but is needed to match the new communicator specific functions. \begchangefinii The use of \mpifunc{MPI\_ATTR\_DELETE} is deprecated. \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.2, p.202 l.15 - p.203 l.30, File 2.0/ei-2.tex, lines 2433-2512 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Windows} The new functions for caching on windows are: \begchangefini \begin{funcdef}{MPI\_WIN\_CREATE\_KEYVAL(win\_copy\_attr\_fn, win\_delete\_attr\_fn, win\_keyval, extra\_state)} \funcarg{\IN}{win\_copy\_attr\_fn}{copy callback function for \mpiarg{win\_keyval} (function)} \funcarg{\IN}{win\_delete\_attr\_fn}{delete callback function for \mpiarg{win\_keyval} (function)} \funcarg{\OUT}{win\_keyval}{key value for future access (integer)} \funcarg{\IN}{extra\_state}{extra state for callback functions} \end{funcdef} \mpibind{MPI\_Win\_create\_keyval(MPI\_Win\_copy\_attr\_function~*win\_copy\_attr\_fn, MPI\_Win\_delete\_attr\_function~*win\_delete\_attr\_fn, int~*win\_keyval, void~*extra\_state)} \mpifbind{MPI\_WIN\_CREATE\_KEYVAL(WIN\_COPY\_ATTR\_FN, WIN\_DELETE\_ATTR\_FN, WIN\_KEYVAL, EXTRA\_STATE, IERROR)\fargs EXTERNAL WIN\_COPY\_ATTR\_FN, WIN\_DELETE\_ATTR\_FN\\INTEGER WIN\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE} \begchangefinii \begchangefiniii \mpicppemptybind{MPI::Win::Create\_keyval(MPI::Win::Copy\_attr\_function* win\_copy\_attr\_fn, MPI::Win::Delete\_attr\_function*~win\_delete\_attr\_fn, void*~extra\_state)}{static int} \endchangefiniii \endchangefinii \begchangefinii The argument \mpiarg{win\_copy\_attr\_fn} may be specified as \mpifuncmainindex{MPI\_WIN\_NULL\_COPY\_FN} \mpiskipfunc{MPI\_WIN\_NULL\_COPY\_FN} or \mpifuncmainindex{MPI\_WIN\_DUP\_FN} \mpiskipfunc{MPI\_WIN\_DUP\_FN} from either C, C++, or Fortran. \mpifunc{MPI\_WIN\_NULL\_COPY\_FN} is a function that does nothing other than returning \mpiarg{flag = 0} and \consti{MPI\_SUCCESS}. \mpifunc{MPI\_WIN\_DUP\_FN} is a simple-minded copy function that sets \mpiarg{flag = 1}, returns the value of \mpiarg{attribute\_val\_in} in \mpiarg{attribute\_val\_out}, and returns \consti{MPI\_SUCCESS}. \begchangefinii The argument \mpiarg{win\_delete\_attr\_fn} may be specified as \endchangefinii \mpifuncmainindex{MPI\_WIN\_NULL\_DELETE\_FN} \mpiskipfunc{MPI\_WIN\_NULL\_DELETE\_FN} from either C, C++, or Fortran. \mpiskipfunc{MPI\_WIN\_NULL\_DELETE\_FN} is a function that does nothing, other than returning \consti{MPI\_SUCCESS}. \endchangefinii The C callback functions are: \begchangefinii \mpitypedefbind{MPI\_Win\_copy\_attr\_function(MPI\_Win~oldwin, int~win\_keyval, void~*extra\_state, void~*attribute\_val\_in, void~*attribute\_val\_out, int~*flag)} and \mpitypedefbind{MPI\_Win\_delete\_attr\_function(MPI\_Win~win, int~win\_keyval, void~*attribute\_val, void~*extra\_state)} \endchangefinii The Fortran callback functions are: \mpifsubbind{WIN\_COPY\_ATTR\_FN(OLDWIN, WIN\_KEYVAL, EXTRA\_STATE, ATTRIBUTE\_VAL\_IN, ATTRIBUTE\_VAL\_OUT, FLAG, IERROR)\fargs INTEGER OLDWIN, WIN\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE, ATTRIBUTE\_VAL\_IN,\\ \ \ \ \ ATTRIBUTE\_VAL\_OUT\\LOGICAL FLAG} and \mpifsubbind{WIN\_DELETE\_ATTR\_FN(WIN, WIN\_KEYVAL, ATTRIBUTE\_VAL, EXTRA\_STATE, IERROR)\fargs INTEGER WIN, WIN\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL, EXTRA\_STATE} The C++ callbacks are: \begchangefinii \mpicpptypedefemptybind{MPI::Win::Copy\_attr\_function(const~MPI::Win\&~oldwin, int~win\_keyval, void*~extra\_state, void*~attribute\_val\_in, void*~attribute\_val\_out, bool\&~flag)}{int} and \mpicpptypedefemptybind{MPI::Win::Delete\_attr\_function(MPI::Win\&~win, int~win\_keyval, void*~attribute\_val, void*~extra\_state)}{int} \endchangefini \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 2.3.8, page 26, lines 38-39, File 2.0/misc-1.2.tex, lines 589-592 If an attribute copy function or attribute delete function returns other than \consti{MPI\_SUCCESS}, then the call that caused it to be invoked (for example, \mpifunc{MPI\_WIN\_FREE}), is erroneous. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.2, p.203 l.31 - p.204 l.44, File 2.0/ei-2.tex, lines 2513-2568 \begin{funcdef}{MPI\_WIN\_FREE\_KEYVAL(win\_keyval)} \funcarg{\INOUT}{win\_keyval}{key value (integer)} \end{funcdef} \mpibind{MPI\_Win\_free\_keyval(int *win\_keyval)} \mpifbind{MPI\_WIN\_FREE\_KEYVAL(WIN\_KEYVAL, IERROR)\fargs INTEGER WIN\_KEYVAL, IERROR} \begchangefiniii \mpicppemptybind{MPI::Win::Free\_keyval(int\& win\_keyval)}{static void} \endchangefiniii \begchangefini \begin{funcdef}{MPI\_WIN\_SET\_ATTR(win, win\_keyval, attribute\_val)} \funcarg{\INOUT}{win}{window to which attribute will be attached (handle)} \funcarg{\IN}{win\_keyval}{key value (integer)} \funcarg{\IN}{attribute\_val}{attribute value} \end{funcdef} \mpibind{MPI\_Win\_set\_attr(MPI\_Win win, int win\_keyval, void *attribute\_val)} \mpifbind{MPI\_WIN\_SET\_ATTR(WIN, WIN\_KEYVAL, ATTRIBUTE\_VAL, IERROR)\fargs INTEGER WIN, WIN\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL} \mpicppemptybind{MPI::Win::Set\_attr(int win\_keyval, const void* attribute\_val)}{void} \endchangefini \begin{funcdef}{MPI\_WIN\_GET\_ATTR(win, win\_keyval, attribute\_val, flag)} \funcarg{\IN}{win}{window to which the attribute is attached (handle)} \funcarg{\IN}{win\_keyval}{key value (integer)} \funcarg{\OUT}{attribute\_val}{attribute value, unless \mpiarg{flag = false}} \funcarg{\OUT}{flag}{\consti{false} if no attribute is associated with the key (logical)} \end{funcdef} \mpibind{MPI\_Win\_get\_attr(MPI\_Win~win, int~win\_keyval, void~*attribute\_val, int~*flag)} \mpifbind{MPI\_WIN\_GET\_ATTR(WIN, WIN\_KEYVAL, ATTRIBUTE\_VAL, FLAG, IERROR)\fargs INTEGER WIN, WIN\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL\\LOGICAL FLAG} \mpiiidotiMergeFromBALLOTbegin{1}{20b}% MPI-2.1 Ballots 1-4 % \mpicppemptybind{MPI::Win::Get\_attr(const~MPI::Win\&~win, int~win\_keyval, void*~attribute\_val) const}{bool} \mpicppemptybind{MPI::Win::Get\_attr(int~win\_keyval, void*~attribute\_val) const}{bool} \mpiiidotiMergeFromBALLOTendII{1}{20b}% MPI-2.1 Ballots 1-4 \begin{funcdef}{MPI\_WIN\_DELETE\_ATTR(win, win\_keyval)} \funcarg{\INOUT}{win}{window from which the attribute is deleted (handle)} \funcarg{\IN}{win\_keyval}{key value (integer)} \end{funcdef} \mpibind{MPI\_Win\_delete\_attr(MPI\_Win win, int win\_keyval)} \begchangefinii \mpifbind{MPI\_WIN\_DELETE\_ATTR(WIN, WIN\_KEYVAL, IERROR)\fargs INTEGER WIN, WIN\_KEYVAL, IERROR} \endchangefinii \begchangefinii \mpicppemptybind{MPI::Win::Delete\_attr(int win\_keyval)}{void} \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.3, p.204 l.45 - p.206 l.12, File 2.0/ei-2.tex, lines 2569-2644 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Datatypes} \label{subsec:caching:datatypes} The new functions for caching on datatypes are: \begchangefini \begin{funcdef}{MPI\_TYPE\_CREATE\_KEYVAL(type\_copy\_attr\_fn, type\_delete\_attr\_fn, type\_keyval, extra\_state)} \funcarg{\IN}{type\_copy\_attr\_fn}{copy callback function for \mpiarg{type\_keyval} (function)} \funcarg{\IN}{type\_delete\_attr\_fn}{delete callback function for \mpiarg{type\_keyval} (function)} \funcarg{\OUT}{type\_keyval}{key value for future access (integer)} \funcarg{\IN}{extra\_state}{extra state for callback functions} \end{funcdef} \mpibind{MPI\_Type\_create\_keyval(MPI\_Type\_copy\_attr\_function~*type\_copy\_attr\_fn, MPI\_Type\_delete\_attr\_function~*type\_delete\_attr\_fn, int~*type\_keyval, void~*extra\_state)} \mpifbind{MPI\_TYPE\_CREATE\_KEYVAL(TYPE\_COPY\_ATTR\_FN, TYPE\_DELETE\_ATTR\_FN, TYPE\_KEYVAL, EXTRA\_STATE, IERROR)\fargs EXTERNAL TYPE\_COPY\_ATTR\_FN, TYPE\_DELETE\_ATTR\_FN\\INTEGER TYPE\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE} \begchangefinii \mpicppemptybind{MPI::Datatype::Create\_keyval(MPI::Datatype::Copy\_attr\_function* type\_copy\_attr\_fn, MPI::Datatype::Delete\_attr\_function* type\_delete\_attr\_fn, void*~extra\_state)}{static int} \endchangefinii \begchangefinii The argument \mpiarg{type\_copy\_attr\_fn} may be specified as \mpifuncmainindex{MPI\_TYPE\_NULL\_COPY\_FN} \mpiskipfunc{MPI\_TYPE\_NULL\_COPY\_FN} or \mpifuncmainindex{MPI\_TYPE\_DUP\_FN} \mpiskipfunc{MPI\_TYPE\_DUP\_FN} from either C, C++, or Fortran. \mpifunc{MPI\_TYPE\_NULL\_COPY\_FN} is a function that does nothing other than returning \mpiarg{flag = 0} and \consti{MPI\_SUCCESS}. \mpifunc{MPI\_TYPE\_DUP\_FN} is a simple-minded copy function that sets \mpiarg{flag = 1}, returns the value of \mpiarg{attribute\_val\_in} in \mpiarg{attribute\_val\_out}, and returns \consti{MPI\_SUCCESS}. \begchangefinii The argument \mpiarg{type\_delete\_attr\_fn} may be specified as \endchangefinii \mpifuncmainindex{MPI\_TYPE\_NULL\_DELETE\_FN} \mpiskipfunc{MPI\_TYPE\_NULL\_DELETE\_FN} from either C, C++, or Fortran. \mpiskipfunc{MPI\_TYPE\_NULL\_DELETE\_FN} is a function that does nothing, other than returning \consti{MPI\_SUCCESS}. \endchangefinii The C callback functions are: \begchangefinii \mpitypedefbind{MPI\_Type\_copy\_attr\_function(MPI\_Datatype~oldtype, int~type\_keyval, void~*extra\_state, void~*attribute\_val\_in, void~*attribute\_val\_out, int~*flag)} and \mpitypedefbind{MPI\_Type\_delete\_attr\_function(MPI\_Datatype~type, int~type\_keyval, void~*attribute\_val, void~*extra\_state)} \endchangefinii The Fortran callback functions are: \mpifsubbind{TYPE\_COPY\_ATTR\_FN(OLDTYPE, TYPE\_KEYVAL, EXTRA\_STATE, ATTRIBUTE\_VAL\_IN, ATTRIBUTE\_VAL\_OUT, FLAG, IERROR)\fargs INTEGER OLDTYPE, TYPE\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) EXTRA\_STATE,\\ \ \ \ \ ATTRIBUTE\_VAL\_IN, ATTRIBUTE\_VAL\_OUT\\LOGICAL FLAG} and \mpifsubbind{TYPE\_DELETE\_ATTR\_FN(TYPE, TYPE\_KEYVAL, ATTRIBUTE\_VAL, EXTRA\_STATE, IERROR)\fargs INTEGER TYPE, TYPE\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL, EXTRA\_STATE} The C++ callbacks are: \begchangefinii \mpicpptypedefemptybind{MPI::Datatype::Copy\_attr\_function(const~MPI::Datatype\&~oldtype, int~type\_keyval, void*~extra\_state, const~void*~attribute\_val\_in, void*~attribute\_val\_out, bool\&~flag)}{int} and \mpicpptypedefemptybind{MPI::Datatype::Delete\_attr\_function(MPI::Datatype\&~type, int~type\_keyval, void*~attribute\_val, void*~extra\_state)}{int} \endchangefini \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 2.3.8, page 26, lines 38-39, File 2.0/misc-1.2.tex, lines 589-592 If an attribute copy function or attribute delete function returns other than \consti{MPI\_SUCCESS}, then the call that caused it to be invoked (for example, \mpifunc{MPI\_TYPE\_FREE}), is erroneous. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 8.8.2, p.206 l.13 - p.204 l.44, File 2.0/ei-2.tex, lines 2645-2696 \begin{funcdef}{MPI\_TYPE\_FREE\_KEYVAL(type\_keyval)} \funcarg{\INOUT}{type\_keyval}{key value (integer)} \end{funcdef} \mpibind{MPI\_Type\_free\_keyval(int *type\_keyval)} \mpifbind{MPI\_TYPE\_FREE\_KEYVAL(TYPE\_KEYVAL, IERROR)\fargs INTEGER TYPE\_KEYVAL, IERROR} \mpicppemptybind{MPI::Datatype::Free\_keyval(int\& type\_keyval)}{static void} \begchangefini \begin{funcdef}{MPI\_TYPE\_SET\_ATTR(type, type\_keyval, attribute\_val)} \funcarg{\INOUT}{type}{datatype to which attribute will be attached (handle)} \funcarg{\IN}{type\_keyval}{key value (integer)} \funcarg{\IN}{attribute\_val}{attribute value} \end{funcdef} \mpibind{MPI\_Type\_set\_attr(MPI\_Datatype~type, int~type\_keyval, void~*attribute\_val)} \mpifbind{MPI\_TYPE\_SET\_ATTR(TYPE, TYPE\_KEYVAL, ATTRIBUTE\_VAL, IERROR)\fargs INTEGER TYPE, TYPE\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL} \mpicppemptybind{MPI::Datatype::Set\_attr(int type\_keyval, const void* attribute\_val)}{void} \endchangefini \begin{funcdef}{MPI\_TYPE\_GET\_ATTR(type, type\_keyval, attribute\_val, flag)} \funcarg{\IN}{type}{datatype to which the attribute is attached (handle)} \funcarg{\IN}{type\_keyval}{key value (integer)} \funcarg{\OUT}{attribute\_val}{attribute value, unless \mpiarg{flag = false}} \funcarg{\OUT}{flag}{\consti{false} if no attribute is associated with the key (logical)} \end{funcdef} \mpibind{MPI\_Type\_get\_attr(MPI\_Datatype type, int type\_keyval, void *attribute\_val, int *flag)} \mpifbind{MPI\_TYPE\_GET\_ATTR(TYPE, TYPE\_KEYVAL, ATTRIBUTE\_VAL, FLAG, IERROR)\fargs INTEGER TYPE, TYPE\_KEYVAL, IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) ATTRIBUTE\_VAL\\LOGICAL FLAG} \mpicppemptybind{MPI::Datatype::Get\_attr(int type\_keyval, void* attribute\_val) const}{bool} \begin{funcdef}{MPI\_TYPE\_DELETE\_ATTR(type, type\_keyval)} \funcarg{\INOUT}{type}{datatype from which the attribute is deleted (handle)} \funcarg{\IN}{type\_keyval}{key value (integer)} \end{funcdef} \mpibind{MPI\_Type\_delete\_attr(MPI\_Datatype type, int type\_keyval)} \mpifbind{MPI\_TYPE\_DELETE\_ATTR(TYPE, TYPE\_KEYVAL, IERROR)\fargs INTEGER TYPE, TYPE\_KEYVAL, IERROR} \begchangefinii \mpicppemptybind{MPI::Datatype::Delete\_attr(int type\_keyval)}{void} \endchangefinii \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromTWOdotZERObegin% MPI-2.1 - take lines: MPI-2.0, Sect. 4.6, p.42 l.10 - p.42 l.21, File 2.0/misc-2.tex, lines 677-703 \mpiiidotiMergeFromREVIEWbegin{5.e'}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % Because MPI_ERR_KEYVAL is now presented in the list of error classes, % it is more convinient that the following section is moved to here \subsection{Error Class for Invalid Keyval} \label{subsec:ei-attr:invalidkeyval} \mpiiidotiMergeFromREVIEWendII{5.e'}% MPI-2.1 End of review based correction \begchangeoct \status{Passed twice.} \begchangejan Key values for attributes are system-allocated, by \mpifuncindex{MPI\_TYPE\_CREATE\_KEYVAL} \mpifuncindex{MPI\_COMM\_CREATE\_KEYVAL} \mpifuncindex{MPI\_WIN\_CREATE\_KEYVAL} \mpiskipfunc{MPI\_\{TYPE,COMM,WIN\}\_CREATE\_KEYVAL}. Only such values can be passed to the functions that use key values as input arguments. In order to signal that an erroneous key value has been passed to one of these functions, there is a new \MPI/ error class: \error{MPI\_ERR\_KEYVAL}. It can be \endchangejan returned by \mpifunc{MPI\_ATTR\_PUT}, \mpifunc{MPI\_ATTR\_GET}, \mpifunc{MPI\_ATTR\_DELETE}, \mpifunc{MPI\_KEYVAL\_FREE}, \begchangefinii \mpifuncindex{MPI\_TYPE\_DELETE\_ATTR} \mpifuncindex{MPI\_COMM\_DELETE\_ATTR} \mpifuncindex{MPI\_WIN\_DELETE\_ATTR} \mpifuncindex{MPI\_TYPE\_SET\_ATTR} \mpifuncindex{MPI\_COMM\_SET\_ATTR} \mpifuncindex{MPI\_WIN\_SET\_ATTR} \mpifuncindex{MPI\_TYPE\_GET\_ATTR} \mpifuncindex{MPI\_COMM\_GET\_ATTR} \mpifuncindex{MPI\_WIN\_GET\_ATTR} \mpifuncindex{MPI\_TYPE\_FREE\_KEYVAL} \mpifuncindex{MPI\_COMM\_FREE\_KEYVAL} \mpifuncindex{MPI\_WIN\_FREE\_KEYVAL} \mpiskipfunc{MPI\_\{TYPE,COMM,WIN\}\_DELETE\_ATTR}, \mpiskipfunc{MPI\_\{TYPE,COMM,WIN\}\_SET\_ATTR}, \mpiskipfunc{MPI\_\{TYPE,COMM,WIN\}\_GET\_ATTR}, \mpiskipfunc{MPI\_\{TYPE,COMM,WIN\}\_FREE\_KEYVAL}, \endchangefinii \begchangeapr \mpifunc{MPI\_COMM\_DUP}, \mpifunc{MPI\_COMM\_DISCONNECT}, and \mpifunc{MPI\_COMM\_FREE}. The last three are included \endchangeapr because \mpiarg{keyval} is an argument to the copy and delete functions for attributes. \mpiiidotiMergeFromTWOdotZEROend% MPI-2.1 - end of take lines \mpiiidotiMergeFromONEdotTHREEbegin% MPI-2.1 - take lines: MPI-1.1, Chap. 5, p.172 l.1 - p.175 l.25, File 1.3/context.tex, lines 2386-2597 \subsection{Attributes Example} \label{ex:comm-attributes} \begin{users} This example shows how to write a collective communication operation that uses caching to be more efficient after the first call. The coding style assumes that \MPI/ function results return only error statuses. \end{users} \mpiiidotiMergeFromREVIEWbegin{9.o-q}% MPI-2.1 Correction due to Reviews to MPI-2.1 draft Feb.23, 2008 % In the following verbatim, deprecated functions have been substituted: % if ( ! MPI_Keyval_create( gop_stuff_copier, % gop_stuff_destructor, % &gop_key, (void *)0)); % MPI_Attr_get (comm, gop_key, &gop_stuff, &foundflag); % MPI_Attr_put ( comm, gop_key, gop_stuff); \mpiiidotiMergeFromREVIEWendII{9.o-q}% MPI-2.1 End of review based correction %MPI-1.2 \CHANGE{Errata for MPI-1.1, p. 6, l. 40-48}{Fix name when calling \func{MPI\_Keyval\_create}} \begin{verbatim} /* key for this module's stuff: */ static int gop_key = MPI_KEYVAL_INVALID; typedef struct { int ref_count; /* reference count */ /* other stuff, whatever else we want */ } gop_stuff_type; Efficient_Collective_Op (comm, ...) MPI_Comm comm; { gop_stuff_type *gop_stuff; MPI_Group group; int foundflag; MPI_Comm_group(comm, &group); if (gop_key == MPI_KEYVAL_INVALID) /* get a key on first call ever */ { if ( ! MPI_Comm_create_keyval( gop_stuff_copier, gop_stuff_destructor, &gop_key, (void *)0)); /* get the key while assigning its copy and delete callback behavior. */ MPI_Abort (comm, 99); } MPI_Comm_get_attr (comm, gop_key, &gop_stuff, &foundflag); if (foundflag) { /* This module has executed in this group before. We will use the cached information */ } else { /* This is a group that we have not yet cached anything in. We will now do so. */ /* First, allocate storage for the stuff we want, and initialize the reference count */ gop_stuff = (gop_stuff_type *) malloc (sizeof(gop_stuff_type)); if (gop_stuff == NULL) { /* abort on out-of-memory error */ } gop_stuff -> ref_count = 1; /* Second, fill in *gop_stuff with whatever we want. This part isn't shown here */ /* Third, store gop_stuff as the attribute value */ MPI_Comm_set_attr ( comm, gop_key, gop_stuff); } /* Then, in any case, use contents of *gop_stuff to do the global op ... */ } /* The following routine is called by MPI when a group is freed */ gop_stuff_destructor (comm, keyval, gop_stuff, extra) MPI_Comm comm; int keyval; gop_stuff_type *gop_stuff; void *extra; { if (keyval != gop_key) { /* abort -- programming error */ } /* The group's being freed removes one reference to gop_stuff */ gop_stuff -> ref_count -= 1; /* If no references remain, then free the storage */ if (gop_stuff -> ref_count == 0) { free((void *)gop_stuff); } } /* The following routine is called by MPI when a group is copied */ gop_stuff_copier (comm, keyval, extra, gop_stuff_in, gop_stuff_out, flag) MPI_Comm comm; int keyval; gop_stuff_type *gop_stuff_in, *gop_stuff_out; void *extra; { if (keyval != gop_key) { /* abort -- programming error */ } /* The new group adds one reference to this gop_stuff */ gop_stuff -> ref_count += 1; gop_stuff_out = gop_stuff_in; } \end{verbatim} \section{Formalizing the Loosely Synchronous Model} % Passed: 16-0-4 \label{sec:formalizing} In this section, we make further statements about the loosely synchronous model, with particular attention to intra-communication. \subsection{Basic Statements} When a caller passes a communicator (that contains a context and group) to a callee, that communicator must be free of side effects throughout execution of the subprogram: there should be no active operations on that communicator that might involve the process. This provides one model in which libraries can be written, and work ``safely.'' For libraries so designated, the callee has permission to do whatever communication it likes with the communicator, and under the above guarantee knows that no other communications will interfere. Since we permit good implementations to create new communicators without synchronization (such as by preallocated contexts on communicators), this does not impose a significant overhead. This form of safety is analogous to other common computer-science usages, such as passing a descriptor of an array to a library routine. The library routine has every right to expect such a descriptor to be valid and modifiable. \subsection{Models of Execution} In the loosely synchronous model, transfer of control to a {\bf parallel procedure} is effected by having each executing process invoke the procedure. The invocation is a collective operation: it is executed by all processes in the execution group, and invocations are similarly ordered at all processes. However, the invocation need not be synchronized. We say that a parallel procedure is {\em active} in a process if the process belongs to a group that may collectively execute the procedure, and some member of that group is currently executing the procedure code. If a parallel procedure is active in a process, then this process may be receiving messages pertaining to this procedure, even if it does not currently execute the code of this procedure. \subsubsection{Static communicator allocation} This covers the case where, at any point in time, at most one invocation of a parallel procedure can be active at any process, and the group of executing processes is fixed. For example, all invocations of parallel procedures involve all processes, processes are single-threaded, and there are no recursive invocations. In such a case, a communicator can be statically allocated to each procedure. The static allocation can be done in a preamble, as part of initialization code. If the parallel procedures can be organized into libraries, so that only one procedure of each library can be concurrently active in each processor, then it is sufficient to allocate one communicator per library. \subsubsection{Dynamic communicator allocation} Calls of parallel procedures are well-nested if a new parallel procedure is always invoked in a subset of a group executing the same parallel procedure. Thus, processes that execute the same parallel procedure have the same execution stack. In such a case, a new communicator needs to be dynamically allocated for each new invocation of a parallel procedure. The allocation is done by the caller. A new communicator can be generated by a call to \mpifunc{MPI\_COMM\_DUP}, if the callee execution group is identical to the caller execution group, or by a call to \mpifunc{MPI\_COMM\_SPLIT} if the caller execution group is split into several subgroups executing distinct parallel routines. The new communicator is passed as an argument to the invoked routine. The need for generating a new communicator at each invocation can be alleviated or avoided altogether in some cases: If the execution group is not split, then one can allocate a stack of communicators in a preamble, and next manage the stack in a way that mimics the stack of recursive calls. One can also take advantage of the well-ordering property of communication to avoid confusing caller and callee communication, even if both use the same communicator. To do so, one needs to abide by the following two rules: \begin{itemize} \item messages sent before a procedure call (or before a return from the procedure) are also received before the matching call (or return) at the receiving end; \item messages are always selected by source (no use is made of \const{MPI\_ANY\_SOURCE}). \end{itemize} \subsubsection{The General case} In the general case, there may be multiple concurrently active invocations of the same parallel procedure within the same group; invocations may not be well-nested. A new communicator needs to be created for each invocation. It is the user's responsibility to make sure that, should two distinct parallel procedures be invoked concurrently on overlapping sets of processes, then communicator creation be properly coordinated. \mpiiidotiMergeFromONEdotTHREEend% MPI-2.1 - end of take lines % MPI-2.1 - unused lines: MPI-2.0, Chap. 8 (comments), File 2.0/ei-2.tex, lines 2733-2749 (obsolete)