***************************************
C Language Features Used in This Course
***************************************
This documents summarize the C language features frequently employed in this
course.

Compiler
========
gcc
---
We choose the ``gcc`` C language compiler from GCC (GNU Compiler Collection) as
our official compiler. This is the default C compiler on Linux (including the
Windows subsystem of Linux, WSL). Use other compilers at your own risk.

.. warning::

  The ``gcc`` provided with the ``MinGW`` package under Windows is not totally
  compatible to the ``gcc`` we use. Use it at your own risk.

.. warning::

  The ``gcc`` from the XCode Command-line tool on Mac OS is an alias of
  ``clang``, which is the LLVM C compiler. It has some known differences with
  the ``OpenMP`` for ``gcc`` covered in class.

Compilation Command
-------------------
A typical ``gcc`` compilation command looks like::

  gcc <flags> <arguments>

+ ``<flags>`` are the compilation flags
+ ``<arguments>`` are the source files (.c) and libraries (.o)

Useful Compilation Flags
------------------------
+ ``-O`` specifies the compilation optimization level

  * ``-O0`` reduced optimization, default level, use as baseline
  * ``-O2``
  * ``-O3`` recommended
  * ``-Ofast`` with non-standard-compliant optimizations

+ ``-I`` specifies the path to include headers

  * No space between ``-I`` and the path!
  * ``-I.`` means to look for headers in the current directory

+ ``-l`` specifies the libraries to use

  * ``-lpthread`` POSIX thread library

+ ``-f`` specifies flags

  * ``-fpic``  Position dependent
  * ``-fopenmp``  enable OpenMP support

+ ``-m`` specifies the architecture

  * ``-march=native``  use the native architecture
  * ``-m64``  use 64-bit architecture
  * ``-msse4.2``  use SSE4.2 instruction set

+ ``-std`` specifies the C language standard

  * ``-std=gnu90``
  * ``-std=gnu11``
  * ``-std=c99 -pedantic``
  * ``-std=c11 -pedantic``

Performance Optimization in C Language
======================================
+ Compilation optimization

  * ``-O2``
  * ``-O3``
  * ``-Ofast``
  * ``-ffast-math``

+ Selected Optimization Approaches

  * Loop unrolling
  * Cache awareness

    - Optimize the memory access pattern for better cache performance
    - Loop interchange
    - Loop tiling

  * Vectorization (hardware architecture dependent)

    - Intel MMX, SSE, AVX

+ Memory operations

  Block memory operations are usually faster than element-wise operations. The strategy is to use block memory operations whenever possible. The following functions are frequently used in this course:

  * Memory allocation/deallocation

    - ``malloc`` and ``free``
    - ``calloc`` and ``free``
    - ``aligned_alloc`` and ``free`` (C11)

  * Memory block copy/set

    - ``memcpy``
    - ``memset``