[mpiwg-rma] TLS

Wed Nov 7 17:53:43 CST 2018

Nathan,

If we had a “win_dup” function (or if you could emulate it with win_create) + an info key that said that the window object would only be accessed in thread serialized fashion, would it not achieve the same outcome for you?

  — Pavan

Sent from my iPhone

> On Nov 7, 2018, at 5:18 PM, Hammond, Jeff R <jeff.r.hammond at intel.com> wrote:
> 
> Do we have a robust definition of "thread" anywhere in the MPI standard?  The standard is oblivious to POSIX as well as ISO C11 and C++11 so I'm not sure how we can define anything.  As it is, MPI_Is_thread_main (or whatever the name is) is a albatross...
> 
> Jeff
> 
> On 11/7/18, 2:55 PM, "Nathan Hjelm" <hjelmn at lanl.gov> wrote:
> 
> 
> 
>    Thanks for sharing that info Jim and Jeff. This matches my
>    observations with gcc 6.x and newer. With assigning an implicit
>    device context to each thread I see no apparent drop in message
>    rate for small puts when single threaded and a huge boost in
>    performance when multi-threaded. This is why in Open MPI I have
>    adopted this model as the default for a number of transports:
> 
>    ucx:
> 
>    https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/uct/btl_uct_device_context.h#L66
> 
>    ugni:
> 
>    https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/ugni/btl_ugni.h#L572
> 
> 
>    ofi:
> 
>    https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/ofi/btl_ofi_context.c#L343
> 
> 
> 
>    Open MPI has been able to achieve near-perfect scaling with
>    threads and RMA with the RMA-MT benchmarks
>    (http://github.com/hpc/rma-mt). I have some graphs that I will
>    show when I go deeper into the MPI_Win_flush_thread proposal.
> 
>    -Nathan
> 
> 
>>    On Tue, Nov 06, 2018 at 02:30:42AM +0000, Dinan, James wrote:
>> ELF has .tdata sections for thread-local storage, which make __thread storage as fast as any other memory reference.  There's a detailed article by Ulrich Drepper linked by the second page Jeff mentioned.  Recommended reading for anyone who wants more info.
>> 
>> ~Jim.
>> 
>> -----Original Message-----
>> From: "Hammond, Jeff R" <jeff.r.hammond at intel.com>
>> Date: Monday, November 5, 2018 at 7:23 PM
>> To: Pavan Balaji <balaji at anl.gov>, "mpiwg-rma at lists.mpi-forum.org" <mpiwg-rma at lists.mpi-forum.org>, "Thakur, Rajeev" <thakur at anl.gov>, "Dinan, James" <james.dinan at intel.com>, Nathan Hjelm <hjelmn at lanl.gov>, William Gropp <wgropp at illinois.edu>, "Bland, Wesley" <wesley.bland at intel.com>
>> Subject: TLS
>> 
>>    I can't find the email where Jim and I discussed this but the following address the anti-TLS position:
>> 
>>    https://software.intel.com/en-us/blogs/2011/05/02/the-hidden-performance-cost-of-accessing-thread-local-variables
>> 
>>    http://david-grs.github.io/tls_performance_overhead_cost_linux/
>> 
>>    Jeff
>> 
>> 
>> 
>> 
> 
>