Skip to content

Commit fb0edc8

Browse files
author
Henry Jin
committed
merge with examples-internal/v5.1
1 parent 60e8ece commit fb0edc8

File tree

656 files changed

+4524
-898
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

656 files changed

+4524
-898
lines changed

Chap_SIMD.tex

+7-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
\pagebreak
2-
\chapter{SIMD}
1+
\cchapter{SIMD}{SIMD}
32
\label{chap:simd}
43

54
Single instruction, multiple data (SIMD) is a form of parallel execution
@@ -12,7 +11,7 @@ \chapter{SIMD}
1211
Loops without loop-carried backward dependency (or with dependency preserved using
1312
ordered simd) are candidates for vectorization by the compiler for
1413
execution with SIMD units. In addition, with state-of-the-art vectorization
15-
technology and \code{declare simd} construct extensions for function vectorization
14+
technology and \code{declare simd} directive extensions for function vectorization
1615
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
1716
The basic idea is that a scalar function call in a loop can be replaced by a vector version
1817
of the function, and the loop can be vectorized simultaneously by combining a loop
@@ -46,3 +45,8 @@ \chapter{SIMD}
4645
%\code{parallel for simd}).
4746

4847

48+
%===== Examples Sections =====
49+
\input{SIMD/SIMD}
50+
\input{SIMD/linear_modifier}
51+
52+

Chap_affinity.tex

+14-7
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
\pagebreak
2-
\chapter{OpenMP Affinity}
1+
\cchapter{OpenMP Affinity}{affinity}
32
\label{chap:openmp_affinity}
43

54
OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
@@ -53,21 +52,21 @@ \chapter{OpenMP Affinity}
5352
%which sets \code{OMP\_PLACES} specifically for the MPI process.
5453

5554
Threads of a team are positioned onto places in a compact manner, a
56-
scattered distribution, or onto the master's place, by setting the
55+
scattered distribution, or onto the primary thread's place, by setting the
5756
\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to
58-
\plc{close}, \plc{spread}, or \plc{master}, respectively. When
57+
\code{close}, \code{spread}, or \code{primary} (\code{master} has been deprecated), respectively. When
5958
\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and
6059
when the value is TRUE, the binding is implementation defined to
6160
a set of places in the \code{OMP\_PLACES} variable or to places
6261
defined by the implementation if the \code{OMP\_PLACES} variable
63-
is not set.
62+
is not set.
6463

6564
The \code{OMP\_PLACES} variable can also be set to an abstract name
66-
(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is
65+
(\code{threads}, \code{cores}, \code{sockets}) to specify that a place is
6766
either a single hardware thread, a core, or a socket, respectively.
6867
This description of the \code{OMP\_PLACES} is most useful when the
6968
number of threads is equal to the number of hardware thread, cores
70-
or sockets. It can also be used with a \plc{close} or \plc{spread}
69+
or sockets. It can also be used with a \code{close} or \code{spread}
7170
distribution policy when the equality doesn't hold.
7271

7372

@@ -116,3 +115,11 @@ \chapter{OpenMP Affinity}
116115
% thread # 0 * * * * _ _ _ _ _ _ _ _ #mask for thread 0
117116
% thread # 0 _ _ _ _ * * * * _ _ _ _ #mask for thread 1
118117
% thread # 0 _ _ _ _ _ _ _ _ * * * * #mask for thread 2
118+
119+
120+
%===== Examples Sections =====
121+
\input{affinity/affinity}
122+
\input{affinity/task_affinity}
123+
\input{affinity/affinity_display}
124+
\input{affinity/affinity_query}
125+

Chap_data_environment.tex

+20-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
\pagebreak
2-
\chapter{Data Environment}
1+
\cchapter{Data Environment}{data_environment}
32
\label{chap:data_environment}
43
The OpenMP \plc{data environment} contains data attributes of variables and
54
objects. Many constructs (such as \code{parallel}, \code{simd}, \code{task})
@@ -73,3 +72,22 @@ \chapter{Data Environment}
7372
map regions and/or accumulative (unstructured) mappings, determines the operation.
7473
Details of the \code{map} clause and reference count operation are specified
7574
in the \plc{map Clause} subsection of the OpenMP Specifications document.
75+
76+
77+
%===== Examples Sections =====
78+
\input{data_environment/threadprivate}
79+
\input{data_environment/default_none}
80+
\input{data_environment/private}
81+
\input{data_environment/fort_loopvar}
82+
\input{data_environment/fort_sp_common}
83+
\input{data_environment/fort_sa_private}
84+
\input{data_environment/carrays_fpriv}
85+
\input{data_environment/lastprivate}
86+
\input{data_environment/reduction}
87+
\input{data_environment/udr}
88+
\input{data_environment/scan}
89+
\input{data_environment/copyin}
90+
\input{data_environment/copyprivate}
91+
\input{data_environment/cpp_reference}
92+
\input{data_environment/associate}
93+

Chap_devices.tex

+24-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
\pagebreak
2-
\chapter{Devices}
1+
\cchapter{Devices}{devices}
32
\label{chap:devices}
43

54
The \code{target} construct consists of a \code{target} directive
@@ -51,3 +50,26 @@ \chapter{Devices}
5150
pre-4.5 code; it is a necessary element for asynchronous
5251
execution of the \code{target} region when using the new \code{nowait}
5352
clause introduced in OpenMP 4.5.
53+
54+
55+
%===== Examples Sections =====
56+
\input{devices/target}
57+
\input{devices/target_defaultmap}
58+
\input{devices/target_pointer_mapping}
59+
\input{devices/target_structure_mapping}
60+
\input{devices/target_fort_allocatable_array_mapping}
61+
\input{devices/array_sections}
62+
\input{devices/array_shaping}
63+
\input{devices/target_mapper}
64+
\input{devices/target_data}
65+
\input{devices/target_unstructured_data}
66+
\input{devices/target_update}
67+
\input{devices/target_associate_ptr}
68+
\input{devices/declare_target}
69+
\input{devices/teams}
70+
\input{devices/async_target_depend}
71+
\input{devices/async_target_with_tasks}
72+
\input{devices/async_target_nowait}
73+
\input{devices/async_target_nowait_depend}
74+
\input{devices/device}
75+

Chap_directives.tex

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
\cchapter{OpenMP Directive Syntax}{directives}
2+
\label{chap:directive_syntax}
3+
4+
OpenMP \emph{directives} use base-language mechanisms to specify OpenMP program behavior.
5+
In C code, the directives are formed exclusively with pragmas, whereas in C++
6+
code, directives are formed from either pragmas or attributes.
7+
Fortran directives are formed with comments in free form and fixed form sources (codes).
8+
All of these mechanism allow the compilation to ignore the OpenMP directives if
9+
OpenMP is not supported or enabled.
10+
11+
12+
The OpenMP directive is a combination of the base-language mechanism and a \plc{directive-specification},
13+
as shown below. The \plc{directive-specification} consists
14+
of the \plc{directive-name} which may seldomly have arguments,
15+
followed by optional \plc{clauses}. Full details of the syntax can be found in the OpenMP Specification.
16+
Illustrations of the syntax is given in the examples.
17+
18+
The formats for combining a base-language mechanism and a \plc{directive-specification} are:
19+
20+
C/C++ pragmas
21+
\begin{indentedcodelist}
22+
\code{\#pragma omp} \plc{directive-specification}
23+
\end{indentedcodelist}
24+
25+
C++ attributes
26+
\begin{indentedcodelist}
27+
\code{[[omp :: directive(} \plc{directive-specification} \code{)]]}
28+
\code{[[using omp : directive(} \plc{directive-specification} \code{)]]}
29+
\end{indentedcodelist}
30+
31+
Fortran comments
32+
\begin{indentedcodelist}
33+
\code{!\$omp} \plc{directive-specification}
34+
\end{indentedcodelist}
35+
36+
where \code{c\$omp} and \code{*\$omp} may be used in Fortran fixed form sources.
37+
38+
39+
%===== Examples Sections =====
40+
\input{directives/pragmas}
41+
\input{directives/attributes}
42+
\input{directives/fixed_format_comments}
43+
\input{directives/free_format_comments}
44+
45+

Chap_loop_transformations.tex

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
\cchapter{Loop Transformations}{loop_transformations}
2+
\label{chap:loop_transformations}
3+
4+
To obtain better performance on a platform, code may need to be restructured
5+
relative to the way it is written (which is often for best readability).
6+
User-directed loop transformations accomplish this goal by providing a means
7+
to separate code semantics and its optimization.
8+
9+
A loop transformation construct states that a transformation operation is to be
10+
performed on set of nested loops. This directive approach can target specific loops
11+
for transformation, rather than applying more time-consuming general compiler
12+
heuristics methods with compiler options that may not be able to discover
13+
optimal transformations.
14+
15+
Loop transformations can be augmented by preprocessor support or OpenMP \code{metadirective}
16+
directives, to select optimal dimension and size parameters for specific platforms,
17+
facilitating a single code base for multiple platforms.
18+
Moreover, directive-based transformations make experimenting easier:
19+
whereby specific hot spots can be affected by transformation directives.
20+
21+
22+
%===== Examples Sections =====
23+
\input{loop_transformations/tile}
24+
\input{loop_transformations/unroll}
25+

Chap_memory_model.tex

+8-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
\pagebreak
2-
\chapter{Memory Model}
1+
\cchapter{Memory Model}{memory_model}
32
\label{chap:memory_model}
43

54
OpenMP provides a shared-memory model that allows all threads on a given
@@ -129,3 +128,10 @@ \chapter{Memory Model}
129128
% in \plc{atomic Construct} subsection of the OpenMP Specifications document).
130129

131130
% Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.
131+
132+
133+
%===== Examples Sections =====
134+
\input{memory_model/mem_model}
135+
\input{memory_model/allocators}
136+
\input{memory_model/fort_race}
137+

Chap_ompt_interface.tex

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
\cchapter{OMPT Interface}{ompt_interface}
2+
\label{chap:ompt_interface}
3+
OMPT defines mechanisms and an API for interfacing with tools in the OpenMP program.
4+
5+
The OMPT API provides the following functionality:
6+
\begin{itemize}
7+
\addtolength{\itemindent}{1cm}
8+
\item examines the state associated with an OpenMP thread
9+
\item interprets the call stack of an OpenMP thread
10+
\item receives notification about OpenMP events
11+
\item traces activity on OpenMP target devices
12+
\item assesses implementation-dependent details
13+
\item controls a tool from an OpenMP application
14+
\end{itemize}
15+
16+
The following sections will illustrate basic mechanisms and operations of the OMPT API.
17+
18+
19+
\input{ompt_interface/ompt_start}

Chap_parallel_execution.tex

+35-11
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
\pagebreak
2-
\chapter{Parallel Execution}
1+
\cchapter{Parallel Execution}{parallel_execution}
32
\label{chap:parallel_execution}
43

54
A single thread, the \plc{initial thread}, begins sequential execution of
@@ -10,7 +9,7 @@ \chapter{Parallel Execution}
109
forming a parallel region. An \plc{initial thread} encountering a \code{parallel}
1110
region forks (creates) a team of threads at the beginning of the
1211
\code{parallel} region, and joins them (removes from execution) at the
13-
end of the region. The initial thread becomes the master thread of the team in a
12+
end of the region. The initial thread becomes the primary thread of the team in a
1413
\code{parallel} region with a \plc{thread} number equal to zero, the other
1514
threads are numbered from 1 to number of threads minus 1.
1615
A team may be comprised of just a single thread.
@@ -19,9 +18,9 @@ \chapter{Parallel Execution}
1918
parallel region. The task that creates a parallel region is suspended while the
2019
tasks of the team are executed. A thread is tied to its task; that is,
2120
only the thread assigned to the task can execute that task. After completion
22-
of the \code{parallel} region, the master thread resumes execution of the generating task.
21+
of the \code{parallel} region, the primary thread resumes execution of the generating task.
2322

24-
%After the \code{parallel} region the master thread becomes the initial
23+
%After the \code{parallel} region the primary thread becomes the initial
2524
%thread again, and continues to execute the \plc{sequential part}.
2625

2726
Any task within a \code{parallel} region is allowed to encounter another
@@ -43,7 +42,8 @@ \chapter{Parallel Execution}
4342
the number of threads becomes an upper limit for the number of threads to be
4443
provided by the OpenMP runtime.
4544

46-
\pagebreak
45+
%\pagebreak
46+
\bigskip
4747
WORKSHARING CONSTRUCTS
4848

4949
A worksharing construct distributes the execution of the associated region
@@ -96,9 +96,33 @@ \chapter{Parallel Execution}
9696
by threads of the team.
9797

9898
\bigskip
99-
MASTER CONSTRUCT
99+
MASKED CONSTRUCT
100+
101+
The \code{masked} construct is not a worksharing construct. The \code{masked} region is
102+
executed only by the primary thread. There is no implicit barrier (and flush)
103+
at the end of the \code{masked} region; hence the other threads of the team continue
104+
execution beyond code statements beyond the \code{masked} region.
105+
The \code{master} contruct, which has been deprecated in OpenMP 5.1, has identical semantics
106+
to the \code{masked} contruct with no \code{filter} clause.
107+
108+
109+
%===== Examples Sections =====
110+
\input{parallel_execution/ploop}
111+
\input{parallel_execution/parallel}
112+
\input{parallel_execution/host_teams}
113+
\input{parallel_execution/nthrs_nesting}
114+
\input{parallel_execution/nthrs_dynamic}
115+
\input{parallel_execution/fort_do}
116+
\input{parallel_execution/nowait}
117+
\input{parallel_execution/collapse}
118+
\input{parallel_execution/linear_in_loop}
119+
\input{parallel_execution/psections}
120+
\input{parallel_execution/fpriv_sections}
121+
\input{parallel_execution/single}
122+
\input{parallel_execution/workshare}
123+
\input{parallel_execution/masked}
124+
\input{parallel_execution/loop}
125+
\input{parallel_execution/pra_iterator}
126+
\input{parallel_execution/set_dynamic_nthrs}
127+
\input{parallel_execution/get_nthrs}
100128

101-
The \code{master} construct is not a worksharing construct. The master region is
102-
is executed only by the master thread. There is no implicit barrier (and flush)
103-
at the end of the \code{master} region; hence the other threads of the team continue
104-
execution beyond code statements beyond the \code{master} region.

0 commit comments

Comments
 (0)