3 minute read

There are many ways to share data between processes.

The most intuitive way is probably using a queue. In fact, I’ve used queues like ZeroMq when establishing communication between processes. They are off the shelf queue library that supports pretty much all langauges that it is almost language agnostic. If you are a C++ programmer, you’ve probably have heard of moodycamel queues. These queues are also blazingly fast.

Another notable IPC method is shared memory. Now there are lots of shared memory libraries and there are a few I’ve looked into: apache plasma shared memory and boost interprocess shared memory. The former is unfortunately deprecated since Arrow version 10.0.0 so that’s out the window. Boost interprocess shared memory gets the job done but working with boost library can be a hassle. So instead of using libraries, I’ve taken the time to write my own simple shared memory library.

Establishing a shared memory can be done by simply calling mmap and passing the mmap pointer to processes that need to commnicate:

size_t size = 5096;
void* ptr = ::mmap(
        NULL,                       // addr: if addr is NULL, then kernel chooses the (page aligned) address 
                                    // at which to create the mapping
        size,                       // length: number of bytes 
        PROT_READ | PROT_WRITE,     // prot: pages may be read and written
        MAP_SHARED | MAP_ANONYMOUS, // flags: share this mapping to other processes | mapping is not backed by any 
                                    // file | don't let it be swapped out
        -1,                         // fd
        0                           // offset
    );

Note that we set MAP_ANONYMOUS flag so our shared memory will not be backed by any file. Later we will explore a file backed shared memory but for now this gets the job done. We have the shared memory pointer, but how do we pass it to other processes? Here we can simply pass it via fork call:

int num_procs = 3;
pid_t pids[num_procs];

for(int i=0; i < num_procs; ++i)
{
    if((pids[i] = fork()) < 0)
    {
        perror("fork, cleaning up...");
        for (const auto& c_pid: pids)
        {
            ::kill(c_pid, SIGTERM);
        }
        while(::wait(NULL) > 0);
        ::munmap(ptr, size);
        return -1;
    }
    else if(pids[i] == 0)
    {
        // child process
        run_app(i, ptr);
    }

    // wait for child processes to finish
    while(::wait(NULL) > 0);
    ::munmap(ptr, size);
    
    return 0;
}

And we are done, but wait, what if we want to create more shared memories but as separate pointers? Then we can call mmap multiple times and munmap them accordingly. However, things get ugly very quickly - what happens if one of the mmap call fails? Then we need to make sure to munmap previously memory mapped pointers.

size_t size = 5096;

void* ptr1 = ::mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (ptr2 == MAP_FAILED){
    goto done;
}
void* ptr2 = ::mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (ptr2 == MAP_FAILED){
    goto munmap_ptr1;
}
void* ptr3 = ::mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (ptr3 == MAP_FAILED){
    goto munmap_ptr2;
}

// ....

munmap_ptr3:
    ::munmap(ptr3, size);
munmap_ptr2:
    ::munmap(ptr2, size);
munmap_ptr1:
    ::munmap(ptr1, size);
done:

return 0;

This is classic C way of doing things and in fact, if we are working with kernel code, this is the cleanest way to handle it. However since we are using C++, there is a better way do handle it using RAII approach. Let us wrap the mmap function in a class so that in the constructor, it calls the mmap function and in the destructor, it calls munmap:

// mmap_wrapper.h
template<typename T>
class Mmap{
protected:
    size_t m_size;
    T* m_ptr;
public:
    Mmap(size_t);
    ~Mmap();
    void* mmap_wrapper(size_t size);

    friend class Mlock<T>;
    friend class MmapHandler<T>;
};
// mmap_wrapper.inl
template<typename T>
inline Mmap<T>::Mmap(size_t size) : 
      m_size(size)
{
    void* ptr = mmap_wrapper(m_size);
    if(ptr == MAP_FAILED){
        throw MmapFailException();
    }
    m_ptr = static_cast<T*>(ptr);
};


template<typename T>
inline Mmap<T>::~Mmap(){
    if(m_ptr != MAP_FAILED){ // we want pid that mapped to unmap
        munmap(static_cast<void*>(m_ptr), m_size);
    }
};


template<typename T>
inline void* Mmap<T>::mmap_wrapper(size_t size) {
    return mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
}

And we are done! Now creating a shared memory can be done in a single line without having to munmap it ourselves:

// POD data
struct SharedData{
    std::array<int, 10> data1;
    std::array<float, 10> data2;
};

Mmap<SharedData> sm(5096);
SharedData& sm_data = *(sm.m_ptr)

We can pass either the shared memory pointer itself or the reference to the data we actually want processes to read/write. Here we used the anonymous mapping and fork system call to create shared memory ipc. Note that this is a bit restrictive - processes that are not children of the parent process cannot access the shared memory.

In the near future we will see how we can make this more scalable and modular!

Leave a comment