Monday, March 10, 2008

assigned_ptr: A smarter smart pointer for managing memory fragmentation

The smart_ptr library in C++ is quite useful for managing the lifespan of objects. It's as close as C++ gets for generic garbage collection. In this post I introduce a new smart_ptr, "assigned_ptr", which results in one less memory heap allocation and can attach custom memory allocation to an object without overriding that object's new/delete operator.

There are 2 basic types of smart pointers that are practical and recommended for use
1. scoped_ptr<T>
2. shared_ptr<T>

Both of these should be used whenever you're managing memory from the heap. The shared_ptr construct is a nice to use however it requires that there are an additional allocation from the Heap for it's shared counter.

Here's an ascii diagram of what's happening:


shared_ptr shared_ptr shared_ptr
\ | /
\ | /
\ | /
\ | /
\ | /
shared_counter* - - - - weak_ptr
|
|
|
|
|
T*


Notice that all the shared_counter* and T* both have to be pointers, while the shared_ptr's are usually allocated on the stack or in a container. Shared_counter is allocated on the heap and this is a performance penalty since shared_ptr is now responsible for allocating and deallocating two objects from the Heap. Besides the performance cost of heap allocation, there is also the cost of fragmenting the heap for shared_counter that can have affect memory performance for the whole application.

There is a reason why we live with this cost: weak_ptr, which is like shared_ptr except that it only prevents the shared_counter from being deallocated instead of both that and the object pointed to.

So using shared_ptr we live with the cost of the shared_counter because we might need a weak_ptr, but what kind of savings could we get if this weren't the case?

Enter assigned_ptr. It works just like shared_ptr except that:
1) it doesn't maintain a separate shared_counter in the heap
2) can't own an external pointer
3) can manage the allocation of an object without necessitating overriding the object's new/delete operator.

The trick here, is that the shared_counter and the object can be allocated on the same block of memory! This pattern matches the intrustive shared_ptr except that by design we are not coupling the implementation with the class.

this ascii diagram shows what this looks like:


assigned_ptr assigned_ptr assigned_ptr
\ | /
\ | /
\ | /
\ | /
\ | /
shared_mem_block*
+--------------+
| counter |
| T Obj |
+--------------+


All of the assigned_ptr's now point to the same shared_mem_block* which contains both the counter and the templated object. Both the object and the counter are constructed using the in-place new operator.

Here's the code that implements this:

#ifndef _ASSIGNED_PTR_H_
#define _ASSIGNED_PTR_H_

#include <new>

//mem_allocator, by default it will forward to delete/new
template<typename T>

class
mem_allocator {
public
:
static
char* Alloc() {

return
new char[sizeof(assigned_ptr_BASE::mem_header) + sizeof(T)];

}


static
void Free(void* ptr) {
delete
[] ptr;
}
};


class
assigned_ptr_BASE {
protected
:
struct
mem_header {

size_t shared_count;
(
void)(*Destructor)(void*);
(
void)(*Deallocator)(void*); //static function for a typed mem deallocator

};
};


template
<typename T>
class
assigned_ptr : public assigned_ptr_BASE {

friend class
mem_allocator<T>;

static
void TDestructor(void* aThisPtr) {

reinterpret_cast
<T*>(aThisPtr)->~T();
}


static
void TDeallocator(void* mem_header) {

mem_allocator<T>::Free(mem_header);
}




public
:

// ctor

assigned_ptr() {

buff = mem_allocator<T>::Alloc();

T* t = get();
new
(t)T();

mem_header *mem_hdr = get_mem_hdr();
mem_hdr->shared_count = 1;

mem_hdr->Destructor = &TDestructor;
mem_hdr->Deallocator = &TDeallocator;
}




//cctor
assigned_ptr(assigned_ptr& other) {
buff = other.buff;

mem_header *mem_hdr = buff->get_mem_hdr();
mem_hdr->shared_count++;
}


//dtor
~assigned_ptr() {
unref();
}



template
<typename U>

void
adopt(assigned_ptr<U>& u_ptr) {

//if U* can't be casted to T* then we will fail right here
T* t = u_ptr.get();
(
void)(t); //prevent compiler warning for above


unref();

//to do: think of a better way to implement this....
buff = reinterpret_cast<char*>(t) - sizeof(mem_header);

ref();
}


template
<typename U>
assigned_ptr<T>& operator=(assigned_ptr<U>& u_ptr) {

adopt(u_ptr);
return
*this;
}


T& operator*() {

T &t = *get();
return
t;
}


T* get() {

return

reinterpret_cast
<T*>(
buff + sizeof(mem_header));
}


void
unref() {

if
(buff){
mem_header *mem_hdr = get_mem_hdr();

mem_hdr->shared_count--;

if
(0 == mem_hdr->shared_count) {

T* t = get();
mem_hdr->Destructor(t);

mem_hdr->Deallocator(mem_hdr);
}

buff = NULL;
}
}


private
:

void
ref() {
mem_header *mem_hdr = get_mem_hdr();

mem_hdr->shared_count++;
}




T& operator->() {
return
*get();
}


private
:

assigned_ptr_BASE::mem_header* get_mem_hdr() {
return
reinterpret_cast<mem_header*>(

buff);
}


char
* buff;
};


#endif //_ASSIGNED_PTR_H_



Notice that we use a static allocator to allocated the block of memory that the T object is constructed on. By default, we pass all the memory allocations to malloc and free. Overriding this is simple, we simply provide a template specialization for the allocator of a specific type and we get "plug-in" memory pools for that type. Here's a simple example of providing a pooled memory allocation for floats:



template<>
class Allocator<float> {
MemPool<
sizeof(floats) + sizeof(assigned_ptr_counter),
100> mMemPool; //assume a memory pool class with Alloc() and Free()

static char* Alloc() {
return mMemPool.Alloc();
}

static void Free(void* ptr) {
mMemPool.Free(ptr);
}
}

No comments: