2020-04-13

std::atomic 用法與範例

本篇介紹 C++ 的 std::atomic 用法，並提供一些範例。

atomic用來保證原子操作，但不保證類型T是無鎖(lock_free)的，
因為不同平台的實作方式不同，可以用is_lock_free成員函式來判斷是不是無鎖的，
另外atomic_flag肯定是無鎖的。

以下循序漸進地介紹使用 atomic，以及為什麼用 atomic。

最簡單的直覺寫法，但是結果錯誤

以下範例為100個執行緒一起執行，並且同時將全域變數 cnt 取出來 +1 計數，
但是由於多個執行緒同時存取 cnt 的關係會造成資料不正確。
來看看結果輸出會是怎樣吧！

std-atomic.cpp

// g++ std-atomic.cpp -o a.out -std=c++11 -pthread
#include <iostream>
#include <thread>

using namespace std;

long cnt = 0;

void counter()
{
    for (int i = 0; i < 100000; i++) {
        cnt += 1;
    }
}

int main(int argc, char* argv[])
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::thread threads[100];
    for (int i = 0; i != 100; i++)
    {
        threads[i] = std::thread(counter);
    }
    for (auto &th : threads)
        th.join();

    auto t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> elapsed = t2 - t1;
    std::cout << "result: " << cnt << std::endl;
    std::cout << "duration: " << elapsed.count() << " ms" << std::endl;
    return 0;
}

輸出如下，

1 2	result: 1866806 duration: 61.7727 ms

結果答案不是我們所預期的 10000000，而且每次跑得結果都不一樣，那考慮加個 mutex 鎖試試看吧！

接著最直覺地加上了 mutex 鎖

加 mutex 鎖來保護臨界區域是最常見的做法，
使用 mutex 來確保同一時間內只有一個執行緒可以存取 cnt，如果對 mutex 不熟悉的話可以看之前的 mutex 介紹
這次來看看結果輸出會是怎樣吧！

std-atomic2.cpp

// g++ std-atomic2.cpp -o a.out -std=c++11 -pthread
#include <iostream>
#include <thread>
#include <mutex>

using namespace std;

long cnt = 0;
std::mutex mtx;

void counter()
{
    for (int i = 0; i < 100000; i++) {
        std::lock_guard<std::mutex> lock(mtx);
        //std::cout << std::this_thread::get_id() << ": " << cnt << '\n';
        //mtx.lock();
        cnt += 1;
        //mtx.unlock();
    }
}

int main(int argc, char* argv[])
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::thread threads[100];
    for (int i = 0; i != 100; i++)
    {
        threads[i] = std::thread(counter);
    }
    for (auto &th : threads)
        th.join();

    auto t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> elapsed = t2 - t1;
    std::cout << "result: " << cnt << std::endl;
    std::cout << "duration: " << elapsed.count() << " ms" << std::endl;
    return 0;
}

輸出如下，

1 2	result: 10000000 duration: 1426.77 ms

這次答案正確了，以我的電腦所花的時間是 1426.77 ms，這個時間會因電腦規格不同而有所不同，
正確地使用 mutex 可以保證資料的正確性，但同時犧牲了效能，
那接下來我們會好奇想要知道，有沒有時間更快的方式呢？atomic？

使用 atomic 達到同樣效果，時間約少了7倍，效能大爆發

接下來這裡介紹本篇重頭戲 atomic，
如果對象是 long 的話，可以用 std::atomic<long>，也可以用 std::atomic_long這個類別，
用 atomic 也可以達到同樣的效果，但所花費的時間有減少嗎？
來看看結果輸出會是怎樣吧！

std-atomic3.cpp

// g++ std-atomic3.cpp -o a.out -std=c++11 -pthread
#include <iostream>
#include <thread>
#include <atomic>

using namespace std;

//std::atomic<long> cnt(0);
std::atomic_long cnt(0);

void counter()
{
    for (int i = 0; i < 100000; i++) {
        cnt += 1;
    }
}

int main(int argc, char* argv[])
{
    auto t1 = std::chrono::high_resolution_clock::now();
    std::thread threads[100];
    for (int i = 0; i != 100; i++)
    {
        threads[i] = std::thread(counter);
    }
    for (auto &th : threads)
        th.join();

    auto t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> elapsed = t2 - t1;
    std::cout << "result: " << cnt << std::endl;
    std::cout << "duration: " << elapsed.count() << " ms" << std::endl;
    return 0;
}

輸出如下，

1 2	result: 10000000 duration: 225.587 ms

這次使用 atomic 的答案也是正確的，同時地只花費了 225.587 ms，相對於 mutex 版本時間花費大幅減少了！
性能簡直大爆發了！！
透過這篇的學習，我們已經認識了 atomic 的威力，接下來的章節介紹更多的 atomic 實用情境。

使用 atomic 搭配自定義的型別/類別

接下來會想試看看如果是使用自定義的結構 struct 或類別 class 是否也能使用 atomic 呢？

std-atomic4.cpp

TBD

參考
https://en.cppreference.com/w/cpp/atomic/atomic
https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/511579/
https://blog.csdn.net/yockie/article/details/8838686