Linked List Issues

Match word(s).	If you have any questions or comments, please visit us on the Forums.
FAQ > Prelude's Corner > Linked List Issues
This item was added on: 2004/09/26 Introduction What is a linked list? List processing List reversal List splicing List sorting List indexing Some useful techniques Introduction Linked lists are very useful data structures. They are used a great deal strictly as themselves, but they serve as a foundation for more complicated linked data structures. It can safely be said that if you don't understand linked lists intimately, you'll be lost with simple binary trees, and have no hope concerning the more complex structures such as sparse matrices. That said, this tutorial will talk about common (and uncommon) usage of linked lists as well as a little bit of theory. Not much though, I was always a practical person who preferred pseudocode to flowcharts. First and foremost, a refresher course (ignoring the fact that just above this tutorial are probably two more). What is a linked list? A linked list is basically a collection of items where each item points to the next one: \|1\| -> \|2\| -> \|3\| -> \|4\| -> \|5\| -> ~ This sequence usually ends with a special item value called a sentinel, most often it will be a simple null pointer. Each item is a structure with a pointer to a structure of the same type as the "next" pointer, the pointer that tells you where to find the next item in the collection: struct item { T data; struct item next; }; This is simple and sufficient for lists that are only traveled unidirectionally (eg. from the first to the last), but if you want to go back to an item you came from, another pointer is required, called the "prev" pointer: struct item { T data; struct item prev; struct item next; }; By using the above structure, you can create lists of items that can be traveled both forward and backward. In other words, you can get to every item in the collection no matter where you are. It would look something like this: \|1\| \|2\| \|3\| \|4\| \|5\| ~ <- \| \| <- \| \| <- \| \| <- \| \| -> \| \| \| \| -> \| \| -> \| \| -> \| \| -> \| \| -> ~ Provided you can read my awful art, you'll notice that the prev pointers act the same way as the next pointers, for example there is a sentinel before the first item in the list to mark the end of the prev pointers. To access any item in the list you can use dereferencing: Go to the next item item = item->next; Go to the previous item item = item->prev; Go to the second item item = item->next->next; Go to yourself item = item->next->prev; Go to the end of a five item list and then back two items, then forward one item = item->next->next->next->next->prev->prev->next Now, while this link chaining can be fun (I even got a little carried away ), it's usually easier to use loops. Here is one common idiom for walking a list: for ( item = list; item != NULL; item = item->next ) { visit ( item ); / visit does something with a single item / } Pretty simple, you may even have noticed the similarity with another common C/C++ idiom, one for walking an array: for ( i = 0; i < size; i++ ) { visit ( array[i] ); } A little more should be said about the sentinel before we continue. There are three types of sentinel: a null pointer, a dummy node (an item with a special invalid value that can be used as a sentinel), and a circular link. The null pointer is simple enough, as is the dummy node, but what's a circular link? A circular link turns the list into a ring type structure where, when you get to the end and go to the next item, you're back at the front: ^----->--------->------->------>- \| \| \| \|1\| \|2\| \|3\| \|4\| \|5\| -< \| \| <- \| \| <- \| \| <- \| \| -> \| \| \| \| -> \| \| -> \| \| -> \| \| -> \| \| >- ^ \| \| \| -<---------<--------------<------ If you can follow that diagram, you're either as crazy as me, or you've got the concepts of a circular linked list down. A few basic conventions for each type of list are as follows (we'll use a single linked list (just next pointers) to keep it simple: Head pointer, null tail: Initialize: list = NULL; Insert n after item: if ( item == NULL ) { list = n; list->next = NULL; } else { n->next = item->next; item->next = n; } Delete after item: save = item->next; item->next = item->next->next; Traverse: for ( item = list; item != NULL; item = item->next ) Dummy head, null tail: Initialize: struct item list; list->next = NULL; Insert n after item: n->next = item->next; item->next = n; Delete after item: save = item->next; item->next = item->next->next; Traverse: for ( item = list->next; item != NULL; item = item->next ) Dummy head, dummy tail: Initialize: struct item list; struct item sentinel; list->next = sentinel; sentinel->next = sentinel; Insert n after item: n->next = item->next; item->next = n; Delete after item: save = item->next; item->next = item->next->next; Traverse: for ( item = list->next; item != sentinel; item = item->next ) Circular list: Initialize: list->next = list; Insert n after item: n->next = item->next; item->next = n; Delete after item: save = item->next; item->next = item->next->next; Traverse: item = list; do { / Work with item / item = item->next; } while ( item != head ); All linked lists will follow some combination of the above conventions, so if you can recognize them all, you'll be able to read elementary list processing code without fail and be able to follow the basic logic of more complex algorithms. A quick note about single linked lists and double linked lists before we continue. When inserting a single linked item into a list, you need to update two links, both next pointers: n->next = item->next; / Set n's next pointer / item->next = n; / Set item's next pointer / However, with a double linked list, the number of links you need to fix doubles. If it helps you to remember, a double linked list is twice the work. n->prev = item; / n's prev pointer / n->next = item->next; / n's next pointer / n->next->prev = n; / n's next->prev pointer / n->prev->next = n; / n's prev->next pointer / Only after those four links have been fixed will the item be correctly inserted. However, if one of those links is a sentinel, you have a problem. So the correct code to insert a node anywhere into a double linked list is: n->prev = item; n->next = item->next; if ( n->prev != NULL ) n->prev->next = n; if ( n->next != NULL ) n->next->prev = n; List processing* All linked lists should be able to insert and delete an item. In fact, the minimal list library that I wrote for this tutorial has only four functions: insert, delete, find the first item and find the last item. Here is the header: #ifndef PRELIST_H #define PRELIST_H typedef enum { BEFORE, AFTER } Direction; typedef struct element Element; typedef struct element List; struct element { Element prev; Element next; int data; }; Element first(Element iter); Element last (Element iter); List add (List list, Element iter, Element item, Direction dir); List rem (List list, Element iter); #endif Simple, isn't it? For code that maintains both a head and tail pointer, the first and last functions will never even be used! The only reason I put them there is because I'm usually too lazy to maintain head and tail pointers. But just so that you know the issues, regular use of first and last can be a big performance hit for large lists. However, since the lists will be small in this tutorial, the performance problems are acceptable. On to the code! List reversal Often, being able to reverse a list in place is a useful operation. For example, when I'm truly lazy I'll sort and then reverse the list to find the largest item. Reversing a single linked list is very simple, just walk the list and build a new one by removing an item from the front of the old list and inserting it into the front of the new list. The result is an algorithm that does this: 0: Old: 1->2->3->4->~ New: ~ 1: Old: 2->3->4->~ New: 1->~ 2: Old: 3->4->~ New: 2->1->~ 3: Old: 4->~ New: 3->2->1->~ 4: Old: ~ New: 4->3->2->1->~ Pseudocode for this algorithm is as easy to write as it is to visualize: sub reverse ( list ) hold done todo done = null todo = list while todo != null hold = todo->next todo->next = done done = todo todo = hold loop return done end sub For double linked lists, you also have to take the prev pointer into account, but with careful thinking, this only adds one line to the algorithm. Here is the C/C++ source for reversing a double linked list: List reverse ( List list ) { Element hold; Element done; Element todo; done = NULL; todo = list; while ( todo != NULL ) { hold = todo->next; todo->next = done; done = todo; todo = hold; done->prev = todo; /* Fix the prev link / } return done; } List splicing* When I was first learning linked lists, the idea of splicing a part of one list into another was difficult. I thought that it must have been a long and convoluted algorithm that was far beyond me. Of course, when I realized that splicing is almost exactly the same as inserting a single item, I was slightly embarrassed. Splicing in a single linked list is awkward, so it really should be avoided. Double linked list splicing is ridiculously easy for most applications. Here is my minimal list splice function: List splice ( List list, Element pos, Element first, Element last, Direction dir ) { assert ( list != NULL ); assert ( pos != NULL ); assert ( first != NULL ); assert ( last != NULL ); if ( dir == BEFORE ) { first->prev = pos->prev; last->next = pos; } else { first->prev = pos; last->next = pos->next; } if ( first->prev != NULL ) first->prev->next = first; if ( last->next != NULL ) last->next->prev = last; return list; } The conditional statements are there to determine whether to add before or after the item in the list to add to, and to cover for possible null pointers. For example, if you splice to the front of a list, first->prev will be NULL, so setting first->prev->next will probably cause a segmentation violation because you're dereferencing a null pointer. Without the checks, and assuming that we will be inserting before the item, we have this: List splice ( List list, Element pos, Element first, Element last ) { first->prev = pos->prev; last->next = pos; first->prev->next = first; last->next->prev = last; return list; } See why I was embarrassed? List sorting Sorting a linked list is a fun exercise, so I'll leave the elementary sorting implementations up to the reader. I'm not going to give a detailed description of sorting because this tutorial is about linked lists, so here is the implementation of mergesort on my minimal list library: List sort ( List self, int (compare) ( const Element a, const Element b ) ) { Element p, q; Element e, tail; int insize = 1; int nmerges; int psize, qsize; int i; if ( self == NULL ) return self; for (;; ) { p = self; self = NULL; tail = NULL; nmerges = 0; while ( p ) { nmerges++; q = p; for ( psize = 0, i = 0; i < insize && q != NULL; psize++, i++ ) q = q->next; qsize = insize; while ( psize > 0 \|\| ( qsize > 0 && q != NULL ) ) { if ( psize == 0 ) { e = q; q = q->next; qsize--; } else if ( qsize == 0 \|\| q == NULL ) { e = p; p = p->next; psize--; } else if ( compare ( p, q ) <= 0 ) { e = p; p = p->next; psize--; } else { e = q; q = q->next; qsize--; } if ( tail != NULL ) tail->next = e; else self = e; e->prev = tail; tail = e; } p = q; } tail->next = NULL; if ( nmerges <= 1 ) return self; insize = 2; } } The mergesort algorithm is unchanged for linked lists, so you can open any book on algorithms and find out exactly how this code works. Mergesort is a good general sorting algorithm as it is guaranteed to run in NlogN time and has the advantage of stability. Competitors such as quicksort and heapsort are not stable, which is the deciding factor much of the time. Another alternative way of sorting a linked list is not obvious at first, but uses the fact that a binary tree is always sorted. Simply walk the list, removing the first item and inserting it into a binary tree. Then do an inorder traversal of the tree, inserting at the end of a new list. Often it makes sense to switch between data structures, despite what people may tell you. Which leads us to the next topic. List indexing Indexing a list is gives you the best of linked lists and arrays. You can insert and delete from a list easily and index an array easily. Of course, it takes some tweaking to get code that uses a list index to run efficiently and without a lot of awkward code, but it can be very useful. To make a list index, you need to know how many items are in the list. Then allocate a dynamic array of pointers to items and traverse the list, adding each item to the new array. Here is some simple code that does this: Element make_index ( List list, int n ) { Element item; Element index; int count; count = 0; for ( item = list; item != NULL; item = item->next ) count++; n = count; index = malloc ( count * sizeof index ); count = 0; for ( item = list; item != NULL; item = item->next ) index[count++] = item; return index; } Note that indexing can be expensive if the list changes constantly. Updating the index requires two traversals of the list and a call to malloc. Also, if you err on the side of efficiency and don't update the index of a constantly changing list, you will eventually have problems with indexing an old list but expecting a current one. Use this trick with care. Some useful techniques* Here are a few interesting and sometimes useful techniques with linked lists: Count the number of items matching a search: int count ( List list, int find ) { int count = 0; while ( list != NULL ) { if ( list->data == find ) count++; } return count; } Get the Nth item in a list: Element getnth ( List list, int index ) { int i; for ( i = 0; i < index; i++ ) { list = list->next; if ( list == NULL ) return NULL; } return list; } Destroy an entire list: This is a tricky situation because you can't free an item and then move to the next item. Many people incorrectly believe that this will work just fine: void destroy ( List list ) { while ( list != NULL ) { free ( list ); list = list->next; } } The correct way is to use a save pointer so that you never access memory that has been released: void destroy ( List list ) { Element save; while ( list != NULL ) { save = list->next; free ( list ); list = save; } } Remove duplicates: void unique ( List list ) { Element curr = list; Element save; /* Empty list / if ( curr == NULL ) return; while ( curr->next != NULL ) { if ( curr->data == curr->next->data ) { save = curr->next->next; free ( curr->next ); curr->next = save; } else curr = curr->next; } } Well, after all of that you're probably wanting to see my minimal list library implementation, so to cap off the tutorial, here is the source code for List.c: #include <stdlib.h> #include <assert.h> #include "List.h" static void insBefore ( Element iter, Element item ) { assert ( iter != NULL ); assert ( item != NULL ); item->prev = iter->prev; item->next = iter; if ( iter->prev != NULL ) iter->prev->next = item; iter->prev = item; } static void insAfter ( Element iter, Element item ) { assert ( iter != NULL ); assert ( item != NULL ); item->prev = iter; item->next = iter->next; if ( iter->next != NULL ) iter->next->prev = item; iter->next = item; } Element first(Element iter) { while ( iter != NULL && iter->prev != NULL ) iter = iter->prev; return iter; } Element last ( Element iter ) { while ( iter != NULL && iter->next != NULL ) iter = iter->next; return iter; } List add ( List list, Element iter, Element item, Direction dir ) { assert( item != NULL ); if ( list == NULL ) / Empty list / list = item; else { if ( iter == list && dir == BEFORE ) { / Before head / insBefore ( list, item ); list = item; } else { if ( dir == BEFORE ) / Before internal node / insBefore ( iter, item ); else / After internal node */ insAfter ( iter, item ); } } return list; } List rem ( List list, Element iter ) { assert ( list != NULL ); assert ( iter != NULL ); if ( iter->prev != NULL ) iter->prev->next = iter->next; if ( iter->next != NULL ) iter->next->prev = iter->prev; if ( iter == list ) list = iter->next; free ( iter ); return list; } Not much bigger than the header, is it? Linked lists can be as simple or complex as you want them to be, but down at the core, they're a piece of cake. Have fun!

Script provided by SmartCGIs