Commit 3ed9b959 authored by Jan Eitzinger's avatar Jan Eitzinger
Browse files
parents 86e57fd5 37290677
......@@ -10,6 +10,7 @@ It contains C modules for:
* Accurate timing
Moreover the benchmark showcases a simple generic Makefile that can be used in other projects.
You may want to have a look at https://github.com/RRZE-HPC/TheBandwidthBenchmark/wiki for a collection of results that were created using TheBandwidthBenchmark.
## Overview
......@@ -88,9 +89,9 @@ To run the benchmark call:
The benchmark will output the results similar to the stream benchmark. Results are validated.
For threaded execution it is recommended to control thread affinity.
We recommend to use likwid-pin for benchmarking:
We recommend to use likwid-pin for setting the number of threads used and to control thread affinity:
```
likwid-pin -c 0-3 ./bwbench-GCC
likwid-pin -C 0-3 ./bwbench-GCC
```
Example output for threaded execution:
......@@ -118,3 +119,42 @@ SDaxpy: 46822.63 23411.32 0.0281 0.0273 0.0325
Solution Validates
```
## Scaling runs
Apart from the highest sustained memory bandwidth also the scaling behavior within memory domains is a important system property.
There is a helper script included in util (```extractResults.pl```) that creates a text result file from multiple runs that can be used as input to plotting applications as gnuplot and xmgrace.
This involves two steps: Executing the benchmark runs and creating the data file.
To run the benchmark for different thread counts within a memory domain execute (this assumes bash or zsh):
```
$ for nt in 1 2 4 6 8 10; do likwid-pin -q -C E:M0:$nt:1:2 ./bwbench-ICC > dat/emmy-$nt.txt; done
```
It is recommended to just use one thread per core in case the processor supports hyperthreading.
Use whatever stepping you like, here a stepping of two was used.
The ```-q``` option suppresses output from ```likwid-pin```.
Above line uses the expression based syntax, on systems with hyperthreading enabled (check with, e.g., ```likwid-topology```) you have to skip the other hardware threads on each core.
For above system with 2 hardware threads per core this results in ```-C E:M0:$nt:1:2```, on a system with 4 hardware threads per core you would need ```-C E:M0:$nt:1:4```.
The string before the dash (here emmy) can be arbitrary, but the the extraction script expects the thread count after the dash.
Also the file ending has to be ```.txt```.
Please check with a text editor on some result files if everything worked as expected.
To extract the results and output in a plottable format execute:
```
./extractResults.pl ./dat
```
The script will pick up all result files in the directory specified and create a column format output file.
In this case:
```
#nt Init Sum Copy Update Triad Daxpy STriad SDaxpy
1 4109 11900 5637 8025 7407 9874 8981 11288
2 8057 22696 11011 15174 14821 18786 17599 21475
4 15602 39327 21020 28197 27287 33633 31939 37146
6 22592 45877 29618 37155 36664 40259 39911 41546
8 28641 46878 35763 40111 40106 41293 41022 41950
10 33151 46741 38187 40269 39960 40922 40567 41606
```
Please be aware the the single core memory bandwidth as well as the scaling behavior depends on the frequency settings.
......@@ -3,7 +3,8 @@ GCC = gcc
LINKER = $(CC)
ifeq ($(ENABLE_OPENMP),true)
OPENMP = -fopenmp
OPENMP = -Xpreprocessor -fopenmp
LIBS = -lomp
endif
VERSION = --version
......@@ -12,4 +13,3 @@ CFLAGS = -Ofast -std=c99 $(OPENMP)
LFLAGS = $(OPENMP)
DEFINES = -D_GNU_SOURCE
INCLUDES =
LIBS =
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#ifdef __linux__
#ifdef _OPENMP
#include <stdlib.h>
......@@ -38,8 +37,7 @@
#define MAX_NUM_THREADS 128
#define gettid() syscall(SYS_gettid)
static int
getProcessorID(cpu_set_t* cpu_set)
static int getProcessorID(cpu_set_t* cpu_set)
{
int processorId;
......@@ -53,8 +51,7 @@ getProcessorID(cpu_set_t* cpu_set)
return processorId;
}
int
affinity_getProcessorId()
int affinity_getProcessorId()
{
cpu_set_t cpu_set;
CPU_ZERO(&cpu_set);
......@@ -63,8 +60,7 @@ affinity_getProcessorId()
return getProcessorID(&cpu_set);
}
void
affinity_pinThread(int processorId)
void affinity_pinThread(int processorId)
{
cpu_set_t cpuset;
pthread_t thread;
......@@ -75,8 +71,7 @@ affinity_pinThread(int processorId)
pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
}
void
affinity_pinProcess(int processorId)
void affinity_pinProcess(int processorId)
{
cpu_set_t cpuset;
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double copy(
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double daxpy(
......@@ -38,9 +37,9 @@ double daxpy(
S = getTimeStamp();
#pragma omp parallel for schedule(static)
for (int i=0; i<N; i++) {
a[i] = a[i] + scalar * b[i];
}
for (int i=0; i<N; i++) {
a[i] = a[i] + scalar * b[i];
}
E = getTimeStamp();
return E-S;
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#ifndef AFFINITY_H
#define AFFINITY_H
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#ifndef __ALLOCATE_H_
#define __ALLOCATE_H_
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#ifndef LIKWID_MARKERS_H
#define LIKWID_MARKERS_H
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#ifndef __TIMING_H_
#define __TIMING_H_
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double init(
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
......@@ -54,11 +53,10 @@
#define LIKWID_PROFILE(tag,call) \
_Pragma ("omp parallel") \
{LIKWID_MARKER_START(#tag);} \
times[tag][k] = call; \
_Pragma ("omp parallel") \
{LIKWID_MARKER_STOP(#tag);}
{LIKWID_MARKER_START(#tag);} \
times[tag][k] = call; \
_Pragma ("omp parallel") \
{LIKWID_MARKER_STOP(#tag);}
typedef enum benchmark {
INIT = 0,
......@@ -115,7 +113,7 @@ int main (int argc, char** argv)
};
LIKWID_MARKER_INIT;
_Pragma("omp parallel")
_Pragma("omp parallel")
{
LIKWID_MARKER_REGISTER("INIT");
LIKWID_MARKER_REGISTER("SUM");
......@@ -146,7 +144,7 @@ _Pragma("omp parallel")
#ifdef _OPENMP
printf(HLINE);
_Pragma("omp parallel")
_Pragma("omp parallel")
{
int k = omp_get_num_threads();
int i = omp_get_thread_num();
......@@ -177,13 +175,10 @@ _Pragma("omp parallel")
scalar = 3.0;
for ( int k=0; k < NTIMES; k++) {
LIKWID_PROFILE(INIT,init(b, scalar, N));
tmp = a[10];
LIKWID_PROFILE(SUM,sum(a, N));
a[10] = tmp;
LIKWID_PROFILE(COPY,copy(c, a, N));
LIKWID_PROFILE(UPDATE,update(a, scalar, N));
LIKWID_PROFILE(TRIAD,triad(a, b, c, scalar, N));
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double sdaxpy(
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double striad(
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double sum(
......
......@@ -2,7 +2,7 @@
* =======================================================================================
*
* Author: Jan Eitzinger (je), jan.eitzinger@fau.de
* Copyright (c) 2019 RRZE, University Erlangen-Nuremberg
* Copyright (c) 2020 RRZE, University Erlangen-Nuremberg
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <stdlib.h>
#include <time.h>
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double triad(
......
......@@ -24,7 +24,6 @@
*
* =======================================================================================
*/
#include <timing.h>
double update(
......
......@@ -4,7 +4,9 @@ bwBench.c contains a single file version of The Bandwidth Benchmark that is tail
It should compile with any C99 compiler.
# Benchmarking skript
# Benchmarking skripts
## bench.pl to determine the absolute highest main memory bandwidth
A wrapper scripts in perl (bench.pl) and python (bench.py) are also provided to scan ranges of thread counts and determine the absolute highest sustained main memory bandwidth. In order to use it `likwid-pin` has to be in your path. The script has three required and one optional command line arguments:
```
......@@ -18,3 +20,26 @@ The script will always use physical cores only, where two SMT threads is the def
```
$./bench.pl ./bwbench-GCC 14-24 10 1
```
## extractResults.pl to generate a plottable output files from multiple scaling runs
Please see how to use it in the toplevel [README](https://github.com/RRZE-HPC/TheBandwidthBenchmark#scaling-runs).
## benchmarkSystem.pl to benchmark a system and generate plots and markdown for the result wiki
**Please use with care!**
The script is designed to be used from the root of TheBandwidthBenchmark.
This script cleans and builds the currently configured toolchain. It expects that all Likwid tools are in the path!
Desired frequency settings must be already in place.
Usage:
```
perl ./benchmarkSystem.pl <DATA-DIR> <EXECUTABLE> <PREFIX>
```
where ```<DATA-DIR>``` is the directory where you want to store all results and generated output.
```<EXECUTABLE>``` is the bwBench executable name, this must be in accordance to the configured tool chain in ```config.mk```. E.g. ```./bwBench-CLANG```.
```<PREFIX>``` is the file prefix for all generated output, e.g. Intel-Haswell .
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment