Archive for January, 2010

What you don’t want to know about Java boxed primitives

January 9, 2010 1 comment

In my last post I used a simple calculation to argument that a List containing 1024 Integer objects will take at least 8kb on a 32bit system, and at least 12kb on a 64bit system. I then also argued that additional storage for 1024 Java objects is needed (some of them might be cached, but let’s talk about that later). Unfortunately however, I did not investigate about how much that would be until Ortwin Glück, who runs the Java Anti-Patterns page, wrote me an email with some very interesting numbers. So I started doing some research:

How can you reliably obtain the size of a random Java object? Well, there are basically two answers: You can use a hack that is outlined here, or you can use Instrumentation.getObjectSize(Object obj). Still, obtaining an implementation of Instrumentation and transversing object graphs is far more work than I want to invest. Luckily there is java.sizeOf which does exactly that for us. java.sizeOf basically is a Java agent and a library to access the functionality of it, packaged into a single class called SizeOf. To see what it can do, I wrote this small program:

package com.wordpress.mlangc.sizeof;

import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

import net.sourceforge.sizeof.SizeOf;

public class TestSizeOf
    private static final String WTF = ":-O";
    private static final int SIZE = 1024;
    public static void main(final String[] args)
        printSize(new Object());
        printSize(new int[SIZE]);
        List<Integer> intList = new ArrayList<Integer>(SIZE);
        for(int i = 0; i != SIZE; ++i)
    private static void printSize(final Object obj)
        System.out.println("sizeof(" + describeObject(obj) + "): " 
                + SizeOf.humanReadable(SizeOf.deepSizeOf((obj))));
    private static String describeObject(final Object obj)
        Class<?> clazz = obj.getClass();
        StringBuilder sb = new StringBuilder(clazz.getCanonicalName());
                throw new AssertionError(WTF);
            sb.insert(sb.length() - 1, Array.getLength(obj));
        else if(obj instanceof Collection<?>)
            Collection<?> collection = (Collection<?>) obj;
        return sb.toString();
    private static String extractGenericTypeFromCollectionHack(final Collection<?> collection)
        //It is impossible to do this reliably because of type erasure;
        //for our purposes however, this simple hack,
        //that just inspects the first element, should do nicely.
            return "?";
        return collection.iterator().next().getClass().getCanonicalName();

Then I executed the code above like this (the -javaagent option is important):

$ java -javaagent:${LIB_PATH}/SizeOf.jar com.wordpress.mlangc.sizeof.TestSizeOf
JAVAGENT: call premain instrumentation for class SizeOf
sizeof(java.lang.Object): 16.0b
sizeof(java.lang.Integer): 24.0b
sizeof(int[1024]): 4.0234375Kb
sizeof(java.util.ArrayList<java.lang.Integer>[1024]): 32.0625Kb

So, yes a plain and mostly useless Java Object consumes 16bytes on my system (x86_64 with sun-jdk-1.6.0_17) – thats already 4 int values! An Integer object swallows 24bytes (I would have guessed 20 = 16 + 4 – maybe this has something to do with alignment?), which is 6 int values. An array of 1024 int values needs 4kb as it should, but an ArrayList<Integer> swallows 32kb = 24kb + 8kb and therefore consumes unbelievable 8 times more memory than the array.

So what do I suggest considering the disturbing facts outlined above?

  • Prefer primitive types to boxed primitives as suggested by EffJava2 Item 49. Not only are they more efficient, the fact that they cannot be null makes them more attractive for other reasons too (albeit not always).
  • If you use wrapped primitives, prefer autoboxing or the Foo.valueOf(foo f) factory methods to direct constructors calls, as this is (citing the Javadocs) “likely to yield significantly better space and time performance by caching frequently requested values”. This is especially important for Boolean, where autoboxing or valueOf(…) never creates a new object.
  • Be aware of the fact, that Collections containing boxed primitives are huge memory wasters compared to arrays, even when caching is considered, but keep in mind, that premature optimization is the root of all evil.

What you don’t want to know about Java Collections

January 7, 2010 1 comment

In one of my previous posts I wrote (I deleted “very” later, after the post was already published, as it seemed to strong):

Prefer lists to arrays unless you have a very good reason not to.

This was mostly in reaction to Lists are overrated, which has since been edited. Before anything else, l want to state clearly that I still stick to that statement. Unfortunately, due deficiencies in the Java language design and standard library APIs, some of these “good reasons” are not obvious:

Lists containing lots of boxed primitives are huge memory wasters: This isn’t really surprising, if you just do the math. Imagine you have 1024 integers. If you store them in an int array, you should be able to get away with about 4kb of memory. But storing them in a list of boxed primitives will take at least 8kb on a 32bit system, and at least 12kb on a 64bit system, because we now also need an additional object reference for every int value we store. Last but not least, we also need an implementation dependent amount of storage for 1024 Java objects, that make matters even worse. Note that this reasoning isn’t limited to Collections, but to boxed primitives in general (see also Item 49 “Prefer primitive types to boxed primitives” in EffJava2).

Collections.sort(…) favors idiots over experts: Let me cite the relevant Javadocs:

This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.

So, yes, the implementors of the Java standard library favor programmers who are clueless enough to sort huge linked lists, over programmers that did their homework. In fact, this simple Qicksort implementation, that sorts the given List in place, outperforms Collections.sort(…) by about 20% for ArrayList<Integer> and doesn’t needlessly allocate possibly huge amounts of memory:

static <T extends Comparable<? super T>> void qsort(final List<T> list)
    qsort(list, 0, list.size());
private static <T extends Comparable<? super T>> void qsort(final List<T> list, final int start, final int end)
    // Handle simple cases:
    if(end - start < 2)
    if(end - start == 2)
        if(list.get(start).compareTo(list.get(end - 1)) > 0)
            swap(list, start, end - 1);
   // Select a new pivot:
   int i1 = start;
   int i2 =  start + (end - start)/2;
   int i3 = end - 1;
   T p1 = list.get(i1);
   T p2 = list.get(i2);
   T p3 = list.get(i3);
   int pivotInd = -1;
   if(p1.compareTo(p2) == -p1.compareTo(p3))
       pivotInd = i1;
   else if(p2.compareTo(p1) == -p2.compareTo(p3))
       pivotInd = i2;
       pivotInd = i3;
   T pivot = list.get(pivotInd);
   // Partition the list using the selected pivot:
   swap(list, pivotInd, end - 1);
   int storeInd = start;
   for(int i = start; i != end - 1; ++i)
       if(pivot.compareTo(list.get(i)) > 0)
           swap(list, i, storeInd++);
    swap(list, storeInd, end - 1);
    // Recurse:
    qsort(list, start, storeInd);
    qsort(list, storeInd + 1, end);

private static <T> void swap(final List<T> list, final int i1, final int i2)
    T tmp = list.get(i1);
    list.set(i1, list.get(i2));
    list.set(i2, tmp);

If you don’t believe me, try it yourself – but please don’t use this code in production without testing it thoroughly.

Categories: Programming Tags: ,