Skip to main content

OpenMP Tutorial

Simple OpenMP Example

Using OpenMP to parallelize the addition of two arrays is relatively straightforward. Here’s an example to demonstrate this:

#include <iostream>
#include <vector>
#include <omp.h>

int main() {
    const int SIZE = 100000; // Example size
    std::vector<int> a(SIZE, 1);  // Initialize all values to 1
    std::vector<int> b(SIZE, 2);  // Initialize all values to 2
    std::vector<int> result(SIZE, 0); // Result array

    #pragma omp parallel for
    for (int i = 0; i < SIZE; i++) {
        result[i] = a[i] + b[i];
    }

    // Check the result (Optional)
    for (int i = 0; i < 10; i++) { // Print the first 10 results for verification
        std::cout << result[i] << " ";
    }
    std::cout << std::endl;

    return 0;
}

This program initializes two arrays, a and b, with values of 1 and 2, respectively. It then uses OpenMP to parallelize the addition of the arrays, storing the result in the result array.

The directive #pragma omp parallel for tells the compiler to parallelize the loop that follows, with each iteration potentially being executed by a different thread. OpenMP handles the thread creation, distribution of loop iterations among threads, and thread cleanup automatically.

Before running this, make sure your compiler supports OpenMP, and you might need to enable it explicitly. For example, when using the GCC compiler:

g++ -fopenmp your_program.cpp -o your_program

This command tells GCC to compile with OpenMP support.

Private clause

Sure, I’ll demonstrate the explicit use of the private clause in OpenMP with an example. Let’s continue with the summation example, but this time, let’s declare the local_sum variable outside the parallel region and then make it private using the private clause:

#include <iostream>
#include <omp.h>

int main() {
    const int N = 100000;
    int global_sum = 0;
    int local_sum = 0; // Declared outside the parallel region

    #pragma omp parallel private(local_sum)
    {
        local_sum = 0; // Initialize the private copy for each thread

        #pragma omp for
        for (int i = 1; i <= N; i++) {
            local_sum += i;
        }

        // Critical section to safely update the global sum
        #pragma omp critical
        {
            global_sum += local_sum;
        }
    }

    std::cout << "Sum of numbers from 1 to " << N << " is: " << global_sum << std::endl;
    return 0;
}

In this example:

  • The local_sum variable is declared outside the parallel region.
  • The private(local_sum) clause explicitly ensures that each thread has its own private copy of the local_sum variable. Each thread’s copy is uninitialized at the start of the parallel region.
  • We initialize local_sum to 0 inside the parallel region to make sure each thread’s private copy starts with a value of 0.
  • Just as before, each thread calculates its local sum and then adds its result to the global_sum inside a critical section.

By using the private clause, we make sure that each thread gets its own separate copy of local_sum, preventing data races and interferences between threads.

Critical section

Certainly. The private clause in OpenMP ensures that each thread has its own local copy of the specified variable(s), ensuring that multiple threads don’t interfere with each other’s local operations.

Here’s a simple illustration that computes the sum of all numbers from 1 to N using OpenMP. This operation is broken down across threads, and each thread calculates the partial sum for its chunk of numbers.

#include <iostream>
#include <omp.h>

int main() {
    const int N = 100000;
    int global_sum = 0;

    #pragma omp parallel
    {
        int local_sum = 0; // This variable will be private to each thread by default
                           // because it's declared inside the parallel region.

        #pragma omp for
        for (int i = 1; i <= N; i++) {
            local_sum += i;
        }

        // Critical section to safely update the global sum
        #pragma omp critical
        {
            global_sum += local_sum;
        }
    }

    std::cout << "Sum of numbers from 1 to " << N << " is: " << global_sum << std::endl;
    return 0;
}

In the example above:

  1. The local_sum variable, which is declared inside the parallel region, is automatically private to each thread. Thus, each thread has its own copy of local_sum, and they can safely perform operations on it without worrying about data races.

  2. After calculating the local sum, each thread adds its local_sum to the shared global_sum. This operation is surrounded by a critical section to ensure that only one thread at a time updates the global_sum. This avoids potential data races.

If you didn’t have the local_sum variable (or didn’t make it private), you’d end up with a race condition where multiple threads try to update the global sum simultaneously, leading to unpredictable results.