What you don’t want to know about Java Collections
In one of my previous posts I wrote (I deleted “very” later, after the post was already published, as it seemed to strong):
Prefer lists to arrays unless you have a
verygood reason not to.
This was mostly in reaction to Lists are overrated, which has since been edited. Before anything else, l want to state clearly that I still stick to that statement. Unfortunately, due deficiencies in the Java language design and standard library APIs, some of these “good reasons” are not obvious:
Lists containing lots of boxed primitives are huge memory wasters: This isn’t really surprising, if you just do the math. Imagine you have 1024 integers. If you store them in an int array, you should be able to get away with about 4kb of memory. But storing them in a list of boxed primitives will take at least 8kb on a 32bit system, and at least 12kb on a 64bit system, because we now also need an additional object reference for every int value we store. Last but not least, we also need an implementation dependent amount of storage for 1024 Java objects, that make matters even worse. Note that this reasoning isn’t limited to Collections, but to boxed primitives in general (see also Item 49 “Prefer primitive types to boxed primitives” in EffJava2).
Collections.sort(…) favors idiots over experts: Let me cite the relevant Javadocs:
This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.
So, yes, the implementors of the Java standard library favor programmers who are clueless enough to sort huge linked lists, over programmers that did their homework. In fact, this simple Qicksort implementation, that sorts the given List in place, outperforms Collections.sort(…) by about 20% for ArrayList<Integer> and doesn’t needlessly allocate possibly huge amounts of memory:
static <T extends Comparable<? super T>> void qsort(final List<T> list)
{
qsort(list, 0, list.size());
}
private static <T extends Comparable<? super T>> void qsort(final List<T> list, final int start, final int end)
{
// Handle simple cases:
if(end - start < 2)
return;
if(end - start == 2)
{
if(list.get(start).compareTo(list.get(end - 1)) > 0)
swap(list, start, end - 1);
return;
}
// Select a new pivot:
int i1 = start;
int i2 = start + (end - start)/2;
int i3 = end - 1;
T p1 = list.get(i1);
T p2 = list.get(i2);
T p3 = list.get(i3);
int pivotInd = -1;
if(p1.compareTo(p2) == -p1.compareTo(p3))
pivotInd = i1;
else if(p2.compareTo(p1) == -p2.compareTo(p3))
pivotInd = i2;
else
pivotInd = i3;
T pivot = list.get(pivotInd);
// Partition the list using the selected pivot:
swap(list, pivotInd, end - 1);
int storeInd = start;
for(int i = start; i != end - 1; ++i)
{
if(pivot.compareTo(list.get(i)) > 0)
swap(list, i, storeInd++);
}
swap(list, storeInd, end - 1);
// Recurse:
qsort(list, start, storeInd);
qsort(list, storeInd + 1, end);
}
private static <T> void swap(final List<T> list, final int i1, final int i2)
{
T tmp = list.get(i1);
list.set(i1, list.get(i2));
list.set(i2, tmp);
}
If you don’t believe me, try it yourself – but please don’t use this code in production without testing it thoroughly.
-
January 9, 2010 at 18:39 | #1What you don’t want to know about Java boxed primitives « Mlangc's Blog