[mpiwg-rma] TLS
Balaji, Pavan
balaji at anl.gov
Wed Nov 7 17:53:43 CST 2018
Nathan,
If we had a “win_dup” function (or if you could emulate it with win_create) + an info key that said that the window object would only be accessed in thread serialized fashion, would it not achieve the same outcome for you?
— Pavan
Sent from my iPhone
> On Nov 7, 2018, at 5:18 PM, Hammond, Jeff R <jeff.r.hammond at intel.com> wrote:
>
> Do we have a robust definition of "thread" anywhere in the MPI standard? The standard is oblivious to POSIX as well as ISO C11 and C++11 so I'm not sure how we can define anything. As it is, MPI_Is_thread_main (or whatever the name is) is a albatross...
>
> Jeff
>
> On 11/7/18, 2:55 PM, "Nathan Hjelm" <hjelmn at lanl.gov> wrote:
>
>
>
> Thanks for sharing that info Jim and Jeff. This matches my
> observations with gcc 6.x and newer. With assigning an implicit
> device context to each thread I see no apparent drop in message
> rate for small puts when single threaded and a huge boost in
> performance when multi-threaded. This is why in Open MPI I have
> adopted this model as the default for a number of transports:
>
> ucx:
>
> https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/uct/btl_uct_device_context.h#L66
>
> ugni:
>
> https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/ugni/btl_ugni.h#L572
>
>
> ofi:
>
> https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/ofi/btl_ofi_context.c#L343
>
>
>
> Open MPI has been able to achieve near-perfect scaling with
> threads and RMA with the RMA-MT benchmarks
> (http://github.com/hpc/rma-mt). I have some graphs that I will
> show when I go deeper into the MPI_Win_flush_thread proposal.
>
> -Nathan
>
>
>> On Tue, Nov 06, 2018 at 02:30:42AM +0000, Dinan, James wrote:
>> ELF has .tdata sections for thread-local storage, which make __thread storage as fast as any other memory reference. There's a detailed article by Ulrich Drepper linked by the second page Jeff mentioned. Recommended reading for anyone who wants more info.
>>
>> ~Jim.
>>
>> -----Original Message-----
>> From: "Hammond, Jeff R" <jeff.r.hammond at intel.com>
>> Date: Monday, November 5, 2018 at 7:23 PM
>> To: Pavan Balaji <balaji at anl.gov>, "mpiwg-rma at lists.mpi-forum.org" <mpiwg-rma at lists.mpi-forum.org>, "Thakur, Rajeev" <thakur at anl.gov>, "Dinan, James" <james.dinan at intel.com>, Nathan Hjelm <hjelmn at lanl.gov>, William Gropp <wgropp at illinois.edu>, "Bland, Wesley" <wesley.bland at intel.com>
>> Subject: TLS
>>
>> I can't find the email where Jim and I discussed this but the following address the anti-TLS position:
>>
>> https://software.intel.com/en-us/blogs/2011/05/02/the-hidden-performance-cost-of-accessing-thread-local-variables
>>
>> http://david-grs.github.io/tls_performance_overhead_cost_linux/
>>
>> Jeff
>>
>>
>>
>>
>
>
More information about the mpiwg-rma
mailing list