OpenMP
diff --git a/‎Chap_SIMD.tex
+7-3 b/‎Chap_SIMD.tex
+7-3
diff --git a/‎Chap_affinity.tex
+14-7 b/‎Chap_affinity.tex
+14-7
diff --git a/‎Chap_data_environment.tex
+20-2 b/‎Chap_data_environment.tex
+20-2
diff --git a/‎Chap_devices.tex
+24-2 b/‎Chap_devices.tex
+24-2
diff --git a/‎Chap_directives.tex
+45 b/‎Chap_directives.tex
+45
diff --git a/‎Chap_loop_transformations.tex
+25 b/‎Chap_loop_transformations.tex
+25
diff --git a/‎Chap_memory_model.tex
+8-2 b/‎Chap_memory_model.tex
+8-2
diff --git a/‎Chap_ompt_interface.tex
+19 b/‎Chap_ompt_interface.tex
+19
diff --git a/‎Chap_parallel_execution.tex
+35-11 b/‎Chap_parallel_execution.tex
+35-11
@@ -1,5 +1,4 @@
-\pagebreak
-\chapter{SIMD}
+\cchapter{SIMD}{SIMD}
 \label{chap:simd}
 
 Single instruction, multiple data (SIMD) is a form of parallel execution 
@@ -12,7 +11,7 @@ \chapter{SIMD}
 Loops without loop-carried backward dependency (or with dependency preserved using 
 ordered simd) are candidates for vectorization by the compiler for 
 execution with SIMD units. In addition, with state-of-the-art vectorization 
-technology and \code{declare simd} construct extensions for function vectorization
+technology and \code{declare simd} directive extensions for function vectorization
 in the OpenMP 4.5 specification, loops with function calls can be vectorized as well. 
 The basic idea is that a scalar function call in a loop can be replaced by a vector version 
 of the function, and the loop can be vectorized simultaneously by combining a loop 
@@ -46,3 +45,8 @@ \chapter{SIMD}
 %\code{parallel for simd}).
 
 
+%===== Examples Sections =====
+\input{SIMD/SIMD}
+\input{SIMD/linear_modifier}
+
+
@@ -1,5 +1,4 @@
-\pagebreak
-\chapter{OpenMP Affinity}
+\cchapter{OpenMP Affinity}{affinity}
 \label{chap:openmp_affinity}
 
 OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of
@@ -53,21 +52,21 @@ \chapter{OpenMP Affinity}
 %which sets \code{OMP\_PLACES} specifically for the MPI process. 
 
 Threads of a team are positioned onto places in a compact manner, a 
-scattered distribution, or onto the master's place, by setting the 
+scattered distribution, or onto the primary thread's place, by setting the 
 \code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause  to 
-\plc{close}, \plc{spread}, or \plc{master}, respectively.  When 
+\code{close}, \code{spread}, or \code{primary} (\code{master} has been deprecated), respectively.  When 
 \code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and 
 when the value is TRUE, the binding is implementation defined to 
 a set of places in the \code{OMP\_PLACES} variable or to places 
 defined by the implementation if the \code{OMP\_PLACES} variable 
-is not set.
+is not set. 
 
 The \code{OMP\_PLACES} variable can also be set to an abstract name 
-(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is
+(\code{threads}, \code{cores}, \code{sockets}) to specify that a place is
 either a single hardware thread, a core, or a socket, respectively. 
 This description of the \code{OMP\_PLACES} is most useful when the 
 number of threads is equal to the number of hardware thread, cores
-or sockets.  It can also be used with a \plc{close} or \plc{spread} 
+or sockets.  It can also be used with a \code{close} or \code{spread} 
 distribution policy when the equality doesn't hold.
 
 
@@ -116,3 +115,11 @@ \chapter{OpenMP Affinity}
 %     thread #     0  * * * *   _ _ _ _   _ _  _  _   #mask for thread 0
 %     thread #     0  _ _ _ _   * * * *   _ _  _  _   #mask for thread 1
 %     thread #     0  _ _ _ _   _ _ _ _   * *  *  *   #mask for thread 2
+
+
+%===== Examples Sections =====
+\input{affinity/affinity}
+\input{affinity/task_affinity}
+\input{affinity/affinity_display}
+\input{affinity/affinity_query}
+
@@ -1,5 +1,4 @@
-\pagebreak
-\chapter{Data Environment}
+\cchapter{Data Environment}{data_environment}
 \label{chap:data_environment}
 The OpenMP \plc{data environment} contains data attributes of variables and
 objects.  Many constructs (such as \code{parallel}, \code{simd}, \code{task}) 
@@ -73,3 +72,22 @@ \chapter{Data Environment}
 map regions and/or accumulative (unstructured) mappings, determines the operation.
 Details of the \code{map} clause and reference count operation are specified 
 in the \plc{map Clause} subsection of the OpenMP Specifications document.
+
+
+%===== Examples Sections =====
+\input{data_environment/threadprivate}
+\input{data_environment/default_none}
+\input{data_environment/private}
+\input{data_environment/fort_loopvar}
+\input{data_environment/fort_sp_common}
+\input{data_environment/fort_sa_private}
+\input{data_environment/carrays_fpriv}
+\input{data_environment/lastprivate}
+\input{data_environment/reduction}
+\input{data_environment/udr}
+\input{data_environment/scan}
+\input{data_environment/copyin}
+\input{data_environment/copyprivate}
+\input{data_environment/cpp_reference}
+\input{data_environment/associate}
+
@@ -1,5 +1,4 @@
-\pagebreak
-\chapter{Devices}
+\cchapter{Devices}{devices}
 \label{chap:devices}
 
 The \code{target} construct consists of a \code{target} directive 
@@ -51,3 +50,26 @@ \chapter{Devices}
 pre-4.5 code; it is a necessary element for asynchronous 
 execution of the \code{target} region when using the new \code{nowait} 
 clause introduced in OpenMP 4.5.
+
+
+%===== Examples Sections =====
+\input{devices/target}
+\input{devices/target_defaultmap}
+\input{devices/target_pointer_mapping}
+\input{devices/target_structure_mapping}
+\input{devices/target_fort_allocatable_array_mapping}
+\input{devices/array_sections}
+\input{devices/array_shaping}
+\input{devices/target_mapper}
+\input{devices/target_data}
+\input{devices/target_unstructured_data}
+\input{devices/target_update}
+\input{devices/target_associate_ptr}
+\input{devices/declare_target}
+\input{devices/teams}
+\input{devices/async_target_depend}
+\input{devices/async_target_with_tasks}
+\input{devices/async_target_nowait}
+\input{devices/async_target_nowait_depend}
+\input{devices/device}
+
@@ -0,0 +1,45 @@
+\cchapter{OpenMP Directive Syntax}{directives}
+\label{chap:directive_syntax}
+
+OpenMP \emph{directives} use base-language mechanisms to specify OpenMP program behavior.
+In C code, the directives are formed exclusively with pragmas, whereas in C++
+code, directives are formed from either pragmas or attributes.
+Fortran directives are formed with comments in free form and fixed form sources (codes).
+All of these mechanism allow the compilation to ignore the OpenMP directives if
+OpenMP is not supported or enabled.
+
+
+The OpenMP directive is a combination of the base-language mechanism and a \plc{directive-specification},
+as shown below. The \plc{directive-specification} consists
+of the \plc{directive-name} which may seldomly have arguments, 
+followed by optional \plc{clauses}. Full details of the syntax can be found in the OpenMP Specification.
+Illustrations of the syntax is given in the examples.
+
+The formats for combining a base-language mechanism and a \plc{directive-specification} are:
+
+C/C++ pragmas
+\begin{indentedcodelist}
+\code{\#pragma omp} \plc{directive-specification}
+\end{indentedcodelist}
+
+C++ attributes
+\begin{indentedcodelist}
+\code{[[omp :: directive(} \plc{directive-specification} \code{)]]}
+\code{[[using omp : directive(} \plc{directive-specification} \code{)]]}
+\end{indentedcodelist}
+
+Fortran comments
+\begin{indentedcodelist}
+\code{!\$omp} \plc{directive-specification}
+\end{indentedcodelist}
+
+where \code{c\$omp} and \code{*\$omp} may be used in Fortran fixed form sources.
+
+
+%===== Examples Sections =====
+\input{directives/pragmas}
+\input{directives/attributes}
+\input{directives/fixed_format_comments}
+\input{directives/free_format_comments}
+
+
@@ -0,0 +1,25 @@
+\cchapter{Loop Transformations}{loop_transformations}
+\label{chap:loop_transformations}
+
+To obtain better performance on a platform, code may need to be restructured 
+relative to the way it is written (which is often for best readability).
+User-directed loop transformations accomplish this goal by providing a means 
+to separate code semantics and its optimization.
+
+A loop transformation construct states that a transformation operation is to be 
+performed on set of nested loops.  This directive approach can target specific loops
+for transformation, rather than applying more time-consuming general compiler 
+heuristics methods with compiler options that may not be able to discover 
+optimal transformations.
+
+Loop transformations can be augmented by preprocessor support or OpenMP \code{metadirective} 
+directives, to select optimal dimension and size parameters for specific platforms,
+facilitating a single code base for multiple platforms.
+Moreover, directive-based transformations make experimenting easier: 
+whereby specific hot spots can be affected by transformation directives.
+
+
+%===== Examples Sections =====
+\input{loop_transformations/tile}
+\input{loop_transformations/unroll}
+
@@ -1,5 +1,4 @@
-\pagebreak
-\chapter{Memory Model}
+\cchapter{Memory Model}{memory_model}
 \label{chap:memory_model}
 
 OpenMP provides a shared-memory model that allows all threads on a given
@@ -129,3 +128,10 @@ \chapter{Memory Model}
 % in \plc{atomic Construct} subsection of the OpenMP Specifications document).
 
 % Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.
+
+
+%===== Examples Sections =====
+\input{memory_model/mem_model}
+\input{memory_model/allocators}
+\input{memory_model/fort_race}
+
@@ -0,0 +1,19 @@
+\cchapter{OMPT Interface}{ompt_interface}
+\label{chap:ompt_interface}
+OMPT defines mechanisms and an API for interfacing with tools in the OpenMP program.
+
+The OMPT API provides the following functionality:
+\begin{itemize}
+  \addtolength{\itemindent}{1cm}
+  \item  examines the state associated with an OpenMP thread
+  \item  interprets the call stack of an OpenMP thread
+  \item  receives notification about OpenMP events
+  \item  traces activity on OpenMP target devices
+  \item  assesses implementation-dependent details
+  \item  controls a tool from an OpenMP application
+\end{itemize}
+
+The following sections will illustrate basic mechanisms and operations of the OMPT API.
+
+
+\input{ompt_interface/ompt_start}
@@ -1,5 +1,4 @@
-\pagebreak
-\chapter{Parallel Execution}
+\cchapter{Parallel Execution}{parallel_execution}
 \label{chap:parallel_execution}
 
 A single thread, the \plc{initial thread}, begins sequential execution of 
@@ -10,7 +9,7 @@ \chapter{Parallel Execution}
 forming a parallel region.  An \plc{initial thread} encountering a \code{parallel} 
 region forks (creates) a team of threads at the beginning of the 
 \code{parallel} region, and joins them (removes from execution) at the 
-end of the region.  The initial thread becomes the master thread of the team in a 
+end of the region.  The initial thread becomes the primary thread of the team in a 
 \code{parallel} region with a \plc{thread} number equal to zero, the other 
 threads are numbered from 1 to number of threads minus 1. 
 A team may be comprised of just a single thread.
@@ -19,9 +18,9 @@ \chapter{Parallel Execution}
 parallel region. The task that creates a parallel region is suspended while the
 tasks of the team are executed.  A thread is tied to its task; that is,
 only the thread assigned to the task can execute that task.  After completion 
-of the \code{parallel} region, the master thread resumes execution of the generating task.  
+of the \code{parallel} region, the primary thread resumes execution of the generating task.  
 
-%After the \code{parallel} region the master thread becomes the initial 
+%After the \code{parallel} region the primary thread becomes the initial 
 %thread again, and continues to execute the \plc{sequential part}.  
 
 Any task within a \code{parallel} region is allowed to encounter another
@@ -43,7 +42,8 @@ \chapter{Parallel Execution}
 the number of threads becomes an upper limit for the number of threads to be
 provided by the OpenMP runtime.
 
-\pagebreak
+%\pagebreak
+\bigskip
 WORKSHARING CONSTRUCTS
 
 A worksharing construct distributes the execution of the associated region
@@ -96,9 +96,33 @@ \chapter{Parallel Execution}
 by threads of the team.  
 
 \bigskip
-MASTER CONSTRUCT
+MASKED CONSTRUCT
+
+The \code{masked} construct is not a worksharing construct.  The \code{masked} region is
+executed only by the primary thread. There is no implicit barrier (and flush) 
+at the end of the \code{masked} region; hence the other threads of the team continue
+execution beyond code statements beyond the \code{masked} region.
+The \code{master} contruct, which has been deprecated in OpenMP 5.1, has identical semantics
+to the \code{masked} contruct with no \code{filter} clause.
+
+
+%===== Examples Sections =====
+\input{parallel_execution/ploop}
+\input{parallel_execution/parallel}
+\input{parallel_execution/host_teams}
+\input{parallel_execution/nthrs_nesting}
+\input{parallel_execution/nthrs_dynamic}
+\input{parallel_execution/fort_do}
+\input{parallel_execution/nowait}
+\input{parallel_execution/collapse}
+\input{parallel_execution/linear_in_loop}
+\input{parallel_execution/psections}
+\input{parallel_execution/fpriv_sections}
+\input{parallel_execution/single}
+\input{parallel_execution/workshare}
+\input{parallel_execution/masked}
+\input{parallel_execution/loop}
+\input{parallel_execution/pra_iterator}
+\input{parallel_execution/set_dynamic_nthrs}
+\input{parallel_execution/get_nthrs}
 
-The \code{master} construct is not a worksharing construct.  The master region is
-is executed only by the master thread. There is no implicit barrier (and flush) 
-at the end of the \code{master} region; hence the other threads of the team continue
-execution beyond code statements beyond the \code{master} region.