Building a Memory Allocator from Scratch

—

Adam

What would be a better way to learn how memory works on low level than writing memory allocator? That’s what I thought and during this journey I realised it is a little bit more complicated than it seems.

What really happens when you use memory allocators inside a program? Behind the scenes, it needs to find a suitable block of memory, split or merge chunks as needed, and keep track of allocations—all while trying to avoid fragmentation. It’s a balancing act of efficiency and performance. But how does it actually work? Let’s dive in.

Initializing and Managing a Fixed Memory Pool

Memory pools can be structured in different ways depending on the needs of an application. Some allocators use bitmaps to track fixed-size memory chunks efficiently, while others rely on segregated free lists, grouping blocks by size for faster allocation. In high-performance systems like the Linux kernel, slab allocation (ex. 1) is often preferred to minimize fragmentation.

ex. 1 - slab allocation — (ex. 1 – slab allocation)

For this allocator, I chose a linked list approach (ex. 2). It provides flexibility for handling variable-sized allocations while allowing free blocks to be split and merged dynamically. Though it may not be the fastest method, it offers a straightforward way to manage memory without excessive overhead.

(ex. 2 - visualization of the linked list for this project) — (ex. 2 – visualization of the linked list for this project)

The linked listed above can be easily described using below struct in C which I named „BlockMeta”. As the metadata I will save the memory block size, int value that flags if the block is free or taken and lastly a pointer to the next block. The pointer called „free_list” will store location for the first header in the linked list – for now it is NULL.

typedef struct BlockMeta {

	size_t size;
	int free;
	struct BlockMeta* next;

} BlockMeta ;

BlockMeta* free_list = NULL;

Now for requesting the memory pool I decided to use VirtualAlloc() function. It makes the allocator Windows-specific but it provides contiguous virtual memory pages meaning the OS will give us requested amount of memory as a single chunk – it makes managing it a lot easier.

First I need to check if the „free_list” is not populated, we don’t want to request memory twice by mistake.

if (free_list != NULL) return 1;

Then I can request the memory, cast it to the (BlockMeta*) type and save it as the free_list. Earlier I defined the POOL_SIZE as 4096 bytes, since this is the default page size on most modern systems. Requesting number of bytes that is multiple of this size ensures optimal efficiency.

free_list = (BlockMeta*)VirtualAlloc(NULL, POOL_SIZE, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

After calling VirtualAlloc I perform simple check if it succeeded. If not, the function will return 0.

if (NULL == free_list) 
{
	printf("mem_init failed\n");
	return 0;
}

At this point free_list is populated with a valid pointer to the first memory block the metadata fields are populated with correct size (size of the pool reduced by the header size) and availability flag; for now the pointer to the next block is NULL as there is only one. Lastly the function returns 1 for success.

free_list->size = POOL_SIZE - sizeof(BlockMeta);
free_list->free = 1;
free_list->next = NULL;

Memory Alignment: Why It Matters

There is an important rule regarding data alignment: A data type of size N bytes must be stored at a memory address that is a multiple of N. That is also true in this case where the user can request any amount of memory.

The word size (smallest natural unit of memory the CPU operates on) is 8 bytes on a 64-bit system. This means that the CPU reads and writes in 8-byte chunks for efficiency.

Below is how I implemented 8 byte alignment for any given requested block – it is very short but crucial:

size_t aligned_size = (size_requested + 7) & ~7;

Explanation of +7: By first adding 7 to the requested size, I make sure that when size_requested is not multiple of 8 it’s always rounded up not down.

Explanation of ~7: All numbers in binary that are multiple of 8 have the first 3 least significant bits set to 0 eg.

8 – 0000 1000
16 – 0001 0000
24 – 0011 0000
104 – 0110 1000

Bitwise NOT operator sets the binary representation of 7 to 0.

Side note: similar effect can be achieved by multiplication but it is much slower than bitwise operation.

Now that alignment is ensured, the next step is to find a free block where the requested memory can fit.

Finding the Right Block: First-Fit vs. Best-Fit Allocation vs. Worst Fit

When a program requests memory we can allocate it based on three main search algorithms:

First-Fit – The memory will be allocated to the first free block that is big enough to accommodate for the requested size. We start searching from the beginning of the list and iterate until we have a match.
Best-Fit – Searching for the smallest free block that is large enough to hold the requested amount of memory. This reduces fragmentation but adds a significant overhead.
Worst-Fit – The opposite of Best-Fit; searches for the largest block, leaving behind much bigger blocks for future use. It can lead to inefficient use of memory.

In this case the fastest and easiest one to implement will be First-Fit. Here’s my take on implementing it.

BlockMeta* current = free_list;
while (current != NULL)
{
        if (current->free != 1 || current->size < aligned_size) { 
		current = current->next;
		continue;
	}
	current->free = 0;
	return (void*)((char*)current + sizeof(BlockMeta));		
}

current is a temporary pointer of type BlockMeta* – it will be used to hold the header of the block that we are currently modifying.

The code iterates through the blocks using the while loop and the first if statement – the block must be free and it’s size must be bigger than the earlier aligned size.

Before returning a pointer to the first address of the memory block the free flag is set to 0. Then the current pointer is moved past the BlockMeta header.

If the list looked like this (ex. 3), the A block would always be skipped as it is already taken. If the requested and later aligned size would be 1600 bytes – both A and B would be skipped.

There is one thing missing after the right block for the requested memory is found – the allocator needs to check if the block is splittable.

Efficient Memory Allocation: Splitting Blocks

After requesting the memory from OS using VirtualAlloc we have only one contiguous memory block. The issue here is that if someone requests only half of it, the second half will go to waste as we don’t have the means to manage it.

In such cases we need to split it into smaller blocks and return only the amount that was requested. The rest will be separated by the BlockMeta header.

All blocks that are splittable will need to be bigger than the requested size and the size of the BlockMeta header that will come after it in order to hold information about the newly created memory block.

if (current->size > aligned_size + sizeof(BlockMeta)) {

BlockMeta* new_block = (BlockMeta*)((char*)current + sizeof(BlockMeta) + aligned_size);

Now that the pointer is set, we can assign metadata to the new header. Size of the new block is calculated by substracting the requested size and the header size. Also whatever block current->next was holding we pass it to the new_block->next to ensure the list is still linked in the right order.

new_block->size = current->size - aligned_size - sizeof(BlockMeta);
new_block->free = 1;
new_block->next = current->next;

Lastly let’s not forget to update the current block to the right values.

current->size = aligned_size;
current->next = new_block;

The block is now successully split ! I will later describe on how undoing it is also crucial to memory management. In the meantime let’s jump to some challenges associated with freeing the memory blocks.

Freeing Memory: The Danger of Double-Free and Dangling Pointers

Freeing memory is just as important as allocating it. If done incorrectly, it can lead to serious memory bugs, including double-free vulnerabilities and dangling pointers, which can cause crashes, unpredictable behavior, or even security exploits.

Double-free happens when a program tries to free a memory block that has already been freed. We can check for that inside the allocator. This is not the case when it comes to dangling pointers though.

Dangling Pointer is when the original pointer still holds the old address, even though the memory has been freed. Using it in such case can cause undefined behavior or use-after-free vulnerability. Checks for this inside the allocator logic itself are not possible (at least not in C) and it is up to the programmer/program to do so manually.

In the function that is responsible for returning a memory block back to the free list we have to include some security checks.

Freeing a NULL pointer.

The first case that would result in undefined behaviour would be trying to free NULL pointer. A easy fix is to just check that. Below if statement returns 0 if that is the case.

if (NULL == ptr) { return 0; }

Freeing an out-of-bound or invalid memory address. a NULL pointer.

If a program tried to provide any address that is not related to the allocator memory block this would result in the allocator trying to access memory that it does not own – further leading to undefined behaviour.

Additionally we need to factor in an edge-case scenario where the pointer is in fact in-bound of the memory pool but does not point to the first address of of an allocated block. Below snippet will account for both cases.

BlockMeta* current = free_list;
BlockMeta* block = NULL;

while (current != NULL) {

	if ((char*)current + sizeof(BlockMeta) == (char*)ptr) {

		block = current;
		break;
	}

	current = current->next;
}

if (NULL == block) { return 0; }

The logic here is going through every header comparing the provided pointer to valid address. If it does not find it, the block pointer remains NULL resulting in exiting the function.

This solution is O(n) complexity – In the worst case, freeing a block requires scanning all headers in the list. If many small blocks exists, freeing becomes expensive. In high-performance allocator this is usually addressed using hash table or using separate free lists per block size. However in a simple linked list allocator like this it is a reasonable trade-off.

Checking for double-free

This check is straight forward because we can use the free flag.

if (1 == block->free) { return 0; }

After the security checks have been made and we are sure the pointer is valid, the only thing left to do is to flag the block as free.

block->free = 1;

At this point all the freed memory blocks remain the same in size, it can lead to fragmentation. Here coalesing is our friend. Let’s take a look on how it works.

Merging Free Blocks: Combating Fragmentation

Imagine a situation where after splitting blocks and freeing the memory we have 5 adjacent blocks with 24 byte size (ex. 4). We cannot assign anything larger than 24 bytes to the list! This issue is example of external fragmentation. Coalescing is the answer here!

(ex. 4 - fragmented memory) — (ex. 4 – fragmented memory)

My approach to solving this is to move through the linked list and keeping an eye on two blocks at a time – this lets it see if both of them are free.

In order to coalesce more than 2 blocks at a time I nested one while loop into another like so.

BlockMeta* next_block = NULL;
current = free_list;

while (current != NULL)
{
	next_block = current->next;

	while (1 == current->free && NULL != next_block && 1 == next_block->free && (char*)current + sizeof(BlockMeta) + current->size == (char*)next_block)
	{
		current->size += next_block->size + sizeof(BlockMeta);
		current->next = next_block->next;

		next_block = current->next;
	}

	current = current->next;

}

The outer while (current != NULL) loop iterates through the free list block by block. The inner while loop checks if the next block is also free and merges it with current. After merging two blocks, the next_block variable is updated to current->next, and the loop keeps checking and merging until it hits a used block. This means that as long as free blocks are adjacent, they will all be merged in one go before moving to the next non-free block.

If you’d like to see how this works in a POC take a look at the section below!

Side note: Unfortunately coalescing is not enough for non-adjacent free blocks – If two free blocks are separated by an allocated block, they cannot be merged.

Proof of Concept: Debugging and Visualizing Memory Allocation

As a POC I’ve decided to just run the allocator in debugging mode and show you how the memory list looks like and how it changes.

Allocating 5 blocks sized 100, 200, 300, 400, 500 bytes

Running mememory() function which is responsible for memory allocation of aligned size and splitting blocks after. I’ve also put a break point before return statement for debugging purposes (ex. 5).

(ex. 5 - allocating 5 memory blocks) — (ex. 5 – allocating 5 memory blocks)

Here is how the memory list looks like (ex. 6). It resulted in 6 blocks, 5 of which were requested and 1 block remains free. All of the blocks have aligned size to be multiple of 8 bytes. Each block is perfectly pointing to the next one ensuring list continuity.

(ex. 6 - linked list in debug mode) — (ex. 6 – linked list in debug mode)

To be 100% sure let’s add everything up to see if it’s performing the allocation without any issues. Reminder that the POOL_SIZE is 4096 bytes.

Adding all blocks: 104 + 200 + 304 + 400 + 504 + 2440 = 3952 bytes

Having 6 blocks there are also 6 BlockMeta headers, each being 24 bytes in size.

6 x 24 + 3952 = 4096 bytes !

Freeing up the memory and coalescing test

Now let’s allocate memory the same way as before and free blocks 1, 2, 4, 5. Purposefully ommiting the 3rd block to show how the adjacent block merge together and the inevitable fragmentation (ex. 7).

(ex. 7 - Two merged blocks and one allocated block in between causing fragmentation) — (ex. 7 – Two merged blocks and one allocated block in between causing fragmentation)

Again to make sure let’s calculate it.

The resulting size after merging two first blocks is result of their aligned sizes and the second header that is no longer needed. The first header still stays holding the metadata.

104 + (24 + 200) = 328 bytes

Now the blocks 4 and 5 along with the remaining memory block.

400 + (24 + 504) + (24 + 2440) = 3392 bytes

Just as an additional test, I’ve freed the third block (ex. 8) – let’s see if it will all merge into one big memory pool.

(ex. 8 - All blocks have coalesced into one) — (ex. 8 – All blocks have coalesced into one)

Here it is! A single block of 4072 bytes – after adding the only header we are left with the correct pool size – 4096 bytes.

<—— Go back