#208: MPI3 Hybrid Programming: multiple endpoints per collective, option 1
Comment(by dougmill@…):

 If the compute/communicate pattern was not (otherwise) synchronized
 between the threads, then the program could not do horizontal parallelism
 this way. If each agent were truly independent, and the
 communication/computation in one agent did not depend on that of other
 agents, then the only way get horizontal parallelism would be to add more
 threads to each endpoint. It is not clear whether multiple methods for
 horizontal parallelism should be offered - e.g. could a program use either
 join/leave or multiple-attach (or even both at different places in the

