Skip to content

Commit 11f2efc

Browse files
author
Henry Jin
committed
v5.2.2 release
1 parent 075683d commit 11f2efc

File tree

183 files changed

+5545
-3897
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

183 files changed

+5545
-3897
lines changed

Chap_SIMD.tex

+18-18
Original file line numberDiff line numberDiff line change
@@ -8,34 +8,34 @@
88
Many processors have SIMD (vector) units that can perform simultaneously
99
2, 4, 8 or more executions of the same operation (by a single SIMD unit).
1010

11-
Loops without loop-carried backward dependency (or with dependency preserved using
12-
ordered simd) are candidates for vectorization by the compiler for
11+
Loops without loop-carried backward dependences (or with dependences preserved using
12+
\kcode{ordered simd}) are candidates for vectorization by the compiler for
1313
execution with SIMD units. In addition, with state-of-the-art vectorization
14-
technology and \code{declare simd} directive extensions for function vectorization
14+
technology and \kcode{declare simd} directive extensions for function vectorization
1515
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
1616
The basic idea is that a scalar function call in a loop can be replaced by a vector version
1717
of the function, and the loop can be vectorized simultaneously by combining a loop
18-
vectorization (\code{simd} directive on the loop) and a function
19-
vectorization (\code{declare simd} directive on the function).
18+
vectorization (\kcode{simd} directive on the loop) and a function
19+
vectorization (\kcode{declare simd} directive on the function).
2020

21-
A \code{simd} construct states that SIMD operations be performed on the
21+
A \kcode{simd} construct states that SIMD operations be performed on the
2222
data within the loop. A number of clauses are available to provide
23-
data-sharing attributes (\code{private}, \code{linear}, \code{reduction} and
24-
\code{lastprivate}). Other clauses provide vector length preference/restrictions
25-
(\code{simdlen} / \code{safelen}), loop fusion (\code{collapse}), and data
26-
alignment (\code{aligned}).
23+
data-sharing attributes (\kcode{private}, \kcode{linear}, \kcode{reduction} and
24+
\kcode{lastprivate}). Other clauses provide vector length preference/restrictions
25+
(\kcode{simdlen} / \kcode{safelen}), loop fusion (\kcode{collapse}), and data
26+
alignment (\kcode{aligned}).
2727

28-
The \code{declare simd} directive designates
28+
The \kcode{declare simd} directive designates
2929
that a vector version of the function should also be constructed for
30-
execution within loops that contain the function and have a \code{simd}
31-
directive. Clauses provide argument specifications (\code{linear},
32-
\code{uniform}, and \code{aligned}), a requested vector length
33-
(\code{simdlen}), and designate whether the function is always/never
34-
called conditionally in a loop (\code{notinbranch}/\code{inbranch}).
30+
execution within loops that contain the function and have a \kcode{simd}
31+
directive. Clauses provide argument specifications (\kcode{linear},
32+
\kcode{uniform}, and \kcode{aligned}), a requested vector length
33+
(\kcode{simdlen}), and designate whether the function is always/never
34+
called conditionally in a loop (\kcode{notinbranch}/\kcode{inbranch}).
3535
The latter is for optimizing performance.
3636

37-
Also, the \code{simd} construct has been combined with the worksharing loop
38-
constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread
37+
Also, the \kcode{simd} construct has been combined with the worksharing loop
38+
constructs (\kcode{for simd} and \kcode{do simd}) to enable simultaneous thread
3939
execution in different SIMD units.
4040
%Hence, the \code{simd} construct can be
4141
%used alone on a loop to direct vectorization (SIMD execution), or in

Chap_affinity.tex

+16-16
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
\cchapter{OpenMP Affinity}{affinity}
22
\label{chap:openmp_affinity}
33

4-
OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
4+
OpenMP Affinity consists of a \kcode{proc_bind} policy (thread affinity policy) and a specification of
55
places (``location units'' or \plc{processors} that may be cores, hardware
66
threads, sockets, etc.).
77
OpenMP Affinity enables users to bind computations on specific places.
@@ -11,13 +11,13 @@
1111
if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.
1212

1313
Often the binding can be managed without resorting to explicitly setting places.
14-
Without the specification of places in the \code{OMP\_PLACES} variable,
14+
Without the specification of places in the \kcode{OMP_PLACES} variable,
1515
the OpenMP runtime will distribute and bind threads using the entire range of processors for
16-
the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable
17-
or the \code{proc\_bind} clause. When places are specified, the OMP runtime
16+
the OpenMP program, according to the \kcode{OMP_PROC_BIND} environment variable
17+
or the \kcode{proc_bind} clause. When places are specified, the OMP runtime
1818
binds threads to the places according to a default distribution policy, or
19-
those specified in the \code{OMP\_PROC\_BIND} environment variable or the
20-
\code{proc\_bind} clause.
19+
those specified in the \kcode{OMP_PROC_BIND} environment variable or the
20+
\kcode{proc_bind} clause.
2121

2222
In the OpenMP Specifications document a processor refers to an execution unit that
2323
is enabled for an OpenMP thread to use. A processor is a core when there is
@@ -31,7 +31,7 @@
3131

3232
The processors available to a process may be a subset of the system's
3333
processors. This restriction may be the result of a
34-
wrapper process controlling the execution (such as \code{numactl} on Linux systems),
34+
wrapper process controlling the execution (such as \plc{numactl} on Linux systems),
3535
compiler options, library-specific environment variables, or default
3636
kernel settings. For instance, the execution of multiple MPI processes,
3737
launched on a single compute node, will each have a subset of processors as
@@ -53,20 +53,20 @@
5353

5454
Threads of a team are positioned onto places in a compact manner, a
5555
scattered distribution, or onto the primary thread's place, by setting the
56-
\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to
57-
\code{close}, \code{spread}, or \code{primary} (\code{master} has been deprecated), respectively. When
58-
\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and
56+
\kcode{OMP_PROC_BIND} environment variable or the \kcode{proc_bind} clause to
57+
\kcode{close}, \kcode{spread}, or \kcode{primary} (\kcode{master} has been deprecated), respectively. When
58+
\kcode{OMP_PROC_BIND} is set to FALSE no binding is enforced; and
5959
when the value is TRUE, the binding is implementation defined to
60-
a set of places in the \code{OMP\_PLACES} variable or to places
61-
defined by the implementation if the \code{OMP\_PLACES} variable
60+
a set of places in the \kcode{OMP_PLACES} variable or to places
61+
defined by the implementation if the \kcode{OMP_PLACES} variable
6262
is not set.
6363

64-
The \code{OMP\_PLACES} variable can also be set to an abstract name
65-
(\code{threads}, \code{cores}, \code{sockets}) to specify that a place is
64+
The \kcode{OMP_PLACES} variable can also be set to an abstract name
65+
(\kcode{threads}, \kcode{cores}, \kcode{sockets}) to specify that a place is
6666
either a single hardware thread, a core, or a socket, respectively.
67-
This description of the \code{OMP\_PLACES} is most useful when the
67+
This description of the \kcode{OMP_PLACES} is most useful when the
6868
number of threads is equal to the number of hardware thread, cores
69-
or sockets. It can also be used with a \code{close} or \code{spread}
69+
or sockets. It can also be used with a \kcode{close} or \kcode{spread}
7070
distribution policy when the equality doesn't hold.
7171

7272

Chap_data_environment.tex

+18-17
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
\cchapter{Data Environment}{data_environment}
22
\label{chap:data_environment}
33
The OpenMP \plc{data environment} contains data attributes of variables and
4-
objects. Many constructs (such as \code{parallel}, \code{simd}, \code{task})
4+
objects. Many constructs (such as \kcode{parallel}, \kcode{simd}, \kcode{task})
55
accept clauses to control \plc{data-sharing} attributes
66
of referenced variables in the construct, where \plc{data-sharing} applies to
77
whether the attribute of the variable is \plc{shared},
88
is \plc{private} storage, or has special operational characteristics
9-
(as found in the \code{firstprivate}, \code{lastprivate}, \code{linear}, or \code{reduction} clause).
9+
(as found in the \kcode{firstprivate}, \kcode{lastprivate}, \kcode{linear}, or \kcode{reduction} clause).
1010

1111
The data environment for a device (distinguished as a \plc{device data environment})
1212
is controlled on the host by \plc{data-mapping} attributes, which determine the
@@ -21,57 +21,57 @@
2121

2222
Certain variables and objects have predetermined attributes.
2323
A commonly found case is the loop iteration variable in associated loops
24-
of a \code{for} or \code{do} construct. It has a private data-sharing attribute.
24+
of a \kcode{for} or \kcode{do} construct. It has a private data-sharing attribute.
2525
Variables with predetermined data-sharing attributes cannot be listed in a data-sharing clause; but there are some
2626
exceptions (mainly concerning loop iteration variables).
2727

2828
Variables with explicitly determined data-sharing attributes are those that are
2929
referenced in a given construct and are listed in a data-sharing attribute
3030
clause on the construct. Some of the common data-sharing clauses are:
31-
\code{shared}, \code{private}, \code{firstprivate}, \code{lastprivate},
32-
\code{linear}, and \code{reduction}. % Are these all of them?
31+
\kcode{shared}, \kcode{private}, \kcode{firstprivate}, \kcode{lastprivate},
32+
\kcode{linear}, and \kcode{reduction}. % Are these all of them?
3333

3434
Variables with implicitly determined data-sharing attributes are those
3535
that are referenced in a given construct, do not have predetermined
3636
data-sharing attributes, and are not listed in a data-sharing
3737
attribute clause of an enclosing construct.
3838
For a complete list of variables and objects with predetermined and
3939
implicitly determined attributes, please refer to the
40-
\plc{Data-sharing Attribute Rules for Variables Referenced in a Construct}
40+
\docref{Data-sharing Attribute Rules for Variables Referenced in a Construct}
4141
subsection of the OpenMP Specifications document.
4242

4343
\bigskip
4444
DATA-MAPPING ATTRIBUTES
4545

46-
The \code{map} clause on a device construct explicitly specifies how the list items in
46+
The \kcode{map} clause on a device construct explicitly specifies how the list items in
4747
the clause are mapped from the encountering task's data environment (on the host)
4848
to the corresponding item in the device data environment (on the device).
4949
The common \plc{list items} are arrays, array sections, scalars, pointers, and
5050
structure elements (members).
5151

5252
Procedures and global variables have predetermined data mapping if they appear
53-
within the list or block of a \code{declare}~\code{target} directive. Also, a C/C++ pointer
53+
within the list or block of a \kcode{declare target} directive. Also, a C/C++ pointer
5454
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
5555
% Waiting for response from Eric on this.
5656

57-
Without explicit mapping, non-scalar and non-pointer variables within the scope of the \code{target}
58-
construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
59-
Without explicit mapping, scalar variables within the scope of the \code{target}
57+
Without explicit mapping, non-scalar and non-pointer variables within the scope of the \kcode{target}
58+
construct are implicitly mapped with a \plc{map-type} of \kcode{tofrom}.
59+
Without explicit mapping, scalar variables within the scope of the \kcode{target}
6060
construct are not mapped, but have an implicit firstprivate data-sharing
6161
attribute. (That is, the value of the original variable is given to a private
6262
variable of the same name on the device.) This behavior can be changed with
63-
the \code{defaultmap} clause.
63+
the \kcode{defaultmap} clause.
6464

65-
The \code{map} clause can appear on \code{target}, \code{target data} and
66-
\code{target enter/exit data} constructs. The operations of creation and
65+
The \kcode{map} clause can appear on \kcode{target}, \kcode{target data} and
66+
\kcode{target enter/exit data} constructs. The operations of creation and
6767
removal of device storage as well as assignment of the original list item
6868
values to the corresponding list items may be complicated when the list
6969
item appears on multiple constructs or when the host and device storage
7070
is shared. In these cases the item's reference count, the number of times
71-
it has been referenced (+1 on entry and -1 on exited) in nested (structured)
71+
it has been referenced (increment by 1 on entry and decrement by 1 on exit) in nested (structured)
7272
map regions and/or accumulative (unstructured) mappings, determines the operation.
73-
Details of the \code{map} clause and reference count operation are specified
74-
in the \plc{map Clause} subsection of the OpenMP Specifications document.
73+
Details of the \kcode{map} clause and reference count operation are specified
74+
in the \docref{\kcode{map} Clause} subsection of the OpenMP Specifications document.
7575

7676

7777
%===== Examples Sections =====
@@ -81,6 +81,7 @@
8181
\input{data_environment/fort_loopvar}
8282
\input{data_environment/fort_sp_common}
8383
\input{data_environment/fort_sa_private}
84+
\input{data_environment/fort_shared_var}
8485
\input{data_environment/carrays_fpriv}
8586
\input{data_environment/lastprivate}
8687
\input{data_environment/reduction}

Chap_devices.tex

+18-17
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
\cchapter{Devices}{devices}
22
\label{chap:devices}
33

4-
The \code{target} construct consists of a \code{target} directive
5-
and an execution region. The \code{target} region is executed on
6-
the default device or the device specified in the \code{device}
4+
The \kcode{target} construct consists of a \kcode{target} directive
5+
and an execution region. The \kcode{target} region is executed on
6+
the default device or the device specified in the \kcode{device}
77
clause.
88

99
In OpenMP version 4.0, by default, all variables within the lexical
@@ -16,39 +16,39 @@
1616
The constructs that explicitly
1717
create storage, transfer data, and free storage on the device
1818
are categorized as structured and unstructured. The
19-
\code{target} \code{data} construct is structured. It creates
20-
a data region around \code{target} constructs, and is
19+
\kcode{target data} construct is structured. It creates
20+
a data region around \kcode{target} constructs, and is
2121
convenient for providing persistent data throughout multiple
22-
\code{target} regions. The \code{target} \code{enter} \code{data} and
23-
\code{target} \code{exit} \code{data} constructs are unstructured, because
22+
\kcode{target} regions. The \kcode{target enter data} and
23+
\kcode{target exit data} constructs are unstructured, because
2424
they can occur anywhere and do not support a ``structure''
25-
(a region) for enclosing \code{target} constructs, as does the
26-
\code{target} \code{data} construct.
25+
(a region) for enclosing \kcode{target} constructs, as does the
26+
\kcode{target data} construct.
2727

28-
The \code{map} clause is used on \code{target}
28+
The \kcode{map} clause is used on \kcode{target}
2929
constructs and the data-type constructs to map host data. It
30-
specifies the device storage and data movement \code{to} and \code{from}
30+
specifies the device storage and data movement \plc{to} and \plc{from}
3131
the device, and controls on the storage duration.
3232

3333
There is an important change in the OpenMP 4.5 specification
3434
that alters the data model for scalar variables and C/C++ pointer variables.
3535
The default behavior for scalar variables and C/C++ pointer variables
36-
in a 4.5 compliant code is \code{firstprivate}. Example
36+
in a 4.5 compliant code is \kcode{firstprivate}. Example
3737
codes that have been updated to reflect this new behavior are
3838
annotated with a description that describes changes required
3939
for correct execution. Often it is a simple matter of mapping
40-
the variable as \code{tofrom} to obtain the intended 4.0 behavior.
40+
the variable as \kcode{tofrom} to obtain the intended 4.0 behavior.
4141

4242
In OpenMP version 4.5 the mechanism for target
4343
execution is specified as occurring through a \plc{target task}.
44-
When the \code{target} construct is encountered a new
45-
\plc{target task} is generated. The \plc{target task}
46-
completes after the \code{target} region has executed and all data
44+
When the \kcode{target} construct is encountered a new
45+
target task is generated. The target task
46+
completes after the \kcode{target} region has executed and all data
4747
transfers have finished.
4848

4949
This new specification does not affect the execution of
5050
pre-4.5 code; it is a necessary element for asynchronous
51-
execution of the \code{target} region when using the new \code{nowait}
51+
execution of the \kcode{target} region when using the new \kcode{nowait}
5252
clause introduced in OpenMP 4.5.
5353

5454

@@ -59,6 +59,7 @@
5959
\input{devices/target_structure_mapping}
6060
\input{devices/target_fort_allocatable_array_mapping}
6161
\input{devices/array_sections}
62+
\input{devices/usm}
6263
\input{devices/C++_virtual_functions}
6364
\input{devices/array_shaping}
6465
\input{devices/target_mapper}

Chap_directives.tex

+15-15
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
\label{chap:directive_syntax}
33
\index{directive syntax}
44

5-
OpenMP \emph{directives} use base-language mechanisms to specify OpenMP program behavior.
5+
OpenMP \plc{directives} use base-language mechanisms to specify OpenMP program behavior.
66
In C code, the directives are formed exclusively with pragmas, whereas in C++
77
code, directives are formed from either pragmas or attributes.
88
Fortran directives are formed with comments in free form and fixed form sources (codes).
@@ -20,36 +20,36 @@
2020

2121
C/C++ pragmas
2222
\begin{indentedcodelist}
23-
\code{\#pragma omp} \plc{directive-specification}
23+
\kcode{\#pragma omp} \plc{directive-specification}
2424
\end{indentedcodelist}
2525

2626
C++ attributes
2727
\begin{indentedcodelist}
28-
\code{[[omp :: directive(} \plc{directive-specification} \code{)]]}
29-
\code{[[using omp : directive(} \plc{directive-specification} \code{)]]}
28+
\kcode{[[omp :: directive( \plc{directive-specification} )]]}
29+
\kcode{[[using omp : directive( \plc{directive-specification} )]]}
3030
\end{indentedcodelist}
3131

3232
Fortran comments
3333
\begin{indentedcodelist}
34-
\code{!\$omp} \plc{directive-specification}
34+
\scode{!$omp} \plc{directive-specification}
3535
\end{indentedcodelist}
3636
37-
where \code{c\$omp} and \code{*\$omp} may be used in Fortran fixed form sources.
37+
where \scode{c$omp} and \scode{*$omp} may be used in Fortran fixed form sources.
3838
3939
Most OpenMP directives accept clauses that alter the semantics of the directive in some way,
4040
and some directives also accept parenthesized arguments that follow the directive name.
41-
A clause may just be a keyword (e.g., \scode{untied}) or it may also accept argument lists
42-
(e.g., \scode{shared(x,y,z)}) and/or optional modifiers (e.g., \scode{tofrom} in
43-
\scode{map(tofrom:}~\scode{x,y,z)}).
41+
A clause may just be a keyword (e.g., \kcode{untied}) or it may also accept argument lists
42+
(e.g., \kcode{shared(\ucode{x,y,z})}) and/or optional modifiers (e.g., \kcode{tofrom} in
43+
\kcode{map(tofrom: \ucode{x,y,z})}).
4444
Clause modifiers may be ``simple'' or ``complex'' -- a complex modifier consists of a
4545
keyword followed by one or more parameters, bracketed by parentheses, while a simple
46-
modifier does not. An example of a complex modifier is the \scode{iterator} modifier,
47-
as in \scode{map(iterator(i=0:n),}~\scode{tofrom:}~\scode{p[i])}, or the \scode{step} modifier, as in
48-
\scode{linear(x:}~\scode{ref,}~\scode{step(4))}.
49-
In the preceding examples, \scode{tofrom} and \scode{ref} are simple modifiers.
46+
modifier does not. An example of a complex modifier is the \kcode{iterator} modifier,
47+
as in \kcode{map(iterator(\ucode{i=0:n}), tofrom: \ucode{p[i]})}, or the \kcode{step} modifier, as in
48+
\kcode{linear(\ucode{x}: ref, step(\ucode{4}))}.
49+
In the preceding examples, \kcode{tofrom} and \kcode{ref} are simple modifiers.
5050
51-
For Fortran, a declarative directive (such as \code{declare}~\code{reduction})
52-
must appear after any \code{USE}, \code{IMPORT}, and \code{IMPLICIT} statements
51+
For Fortran, a declarative directive (such as \kcode{declare reduction})
52+
must appear after any \bcode{USE}, \bcode{IMPORT}, and \bcode{IMPLICIT} statements
5353
in the specification part.
5454
5555

0 commit comments

Comments
 (0)