Skip to content

NYU-HPC23/lecture3

Repository files navigation

Screencast link: https://www.youtube.com/watch?v=G32QMsgr-jM

Last Class

  • Measuring wall-time
  • Calculating FLOP/s and MOP/s
  • Compiler optimization flags: -Ox, -march=native, -g
  • Brief overview of different kinds of parallelism:
    • instruction level (vectorization, pipelining)
    • shared memory (OpenMP)
    • distributd memory (MPI)
  • Tools: ssh, modules

Definitions

  • Arithmetic intensity
  • Latency
  • Bandwidth
  • Cache
  • Cache line
  • Virtual memory, Translation Lookaside Buffer (TLB), Memory pages, Page fault, First touch

Valgrind

--tool=memcheck (default)

  • Out-of-bound array access
  • Uninitialized variables (--track-origins=yes)
  • Memory leaks (--leak-check=full)
  • Attach GDB (--vgdb-error=0)

--tool=callgrind (for profiling/timing)

  • kcachegrind
  • Optional agruments --dump-instr=yes --collect-jumps=yes
  • cg_annotate --auto=yes
  • On Mac: brew install qcachegrind --with-graphviz

--tool=cachegrind (for cache misses)

  • From callgrind --tool=callgrind --cache-sim=yes

References

Takeaway

  • Memory allocations are expensive (i.e. cost of first touch not malloc)
  • Keep most frequently used data in registers, then L1 cache, L2 cache, L3 cache and main memory
  • Regular memory access patterns (implicit prefetch) or explicit prefetch to hide memory latency
  • Valgrind to debug memory errors, profile code and find cache misses

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages