*************************************** C Language Features Used in This Course *************************************** This documents summarize the C language features frequently employed in this course. Compiler ======== gcc --- We choose the ``gcc`` C language compiler from GCC (GNU Compiler Collection) as our official compiler. This is the default C compiler on Linux (including the Windows subsystem of Linux, WSL). Use other compilers at your own risk. .. warning:: The ``gcc`` provided with the ``MinGW`` package under Windows is not totally compatible to the ``gcc`` we use. Use it at your own risk. .. warning:: The ``gcc`` from the XCode Command-line tool on Mac OS is an alias of ``clang``, which is the LLVM C compiler. It has some known differences with the ``OpenMP`` for ``gcc`` covered in class. Compilation Command ------------------- A typical ``gcc`` compilation command looks like:: gcc + ```` are the compilation flags + ```` are the source files (.c) and libraries (.o) Useful Compilation Flags ------------------------ + ``-O`` specifies the compilation optimization level * ``-O0`` reduced optimization, default level, use as baseline * ``-O2`` * ``-O3`` recommended * ``-Ofast`` with non-standard-compliant optimizations + ``-I`` specifies the path to include headers * No space between ``-I`` and the path! * ``-I.`` means to look for headers in the current directory + ``-l`` specifies the libraries to use * ``-lpthread`` POSIX thread library + ``-f`` specifies flags * ``-fpic`` Position dependent * ``-fopenmp`` enable OpenMP support + ``-m`` specifies the architecture * ``-march=native`` use the native architecture * ``-m64`` use 64-bit architecture * ``-msse4.2`` use SSE4.2 instruction set + ``-std`` specifies the C language standard * ``-std=gnu90`` * ``-std=gnu11`` * ``-std=c99 -pedantic`` * ``-std=c11 -pedantic`` Performance Optimization in C Language ====================================== + Compilation optimization * ``-O2`` * ``-O3`` * ``-Ofast`` * ``-ffast-math`` + Selected Optimization Approaches * Loop unrolling * Cache awareness - Optimize the memory access pattern for better cache performance - Loop interchange - Loop tiling * Vectorization (hardware architecture dependent) - Intel MMX, SSE, AVX + Memory operations Block memory operations are usually faster than element-wise operations. The strategy is to use block memory operations whenever possible. The following functions are frequently used in this course: * Memory allocation/deallocation - ``malloc`` and ``free`` - ``calloc`` and ``free`` - ``aligned_alloc`` and ``free`` (C11) * Memory block copy/set - ``memcpy`` - ``memset``