Posts Tagged ‘Collections’

A Note About Sets and Maps

February 18, 2018 Leave a comment

Recently I’ve repeatedly stumbled over code like this:

Set<String> someSet = new HashSet<>(2);

Obviously the intention here was initialize a HashSet for 2 elements. What’s not so obvious, is that this code actually allocates a HashSet that might hold at most a single element, that is subsequently resized to contain up to a maximum of 3 elements when "bar" is added. The explanation for this in fact counterintuitive behavior can be found in the Javadocs, which state

public HashSet​(int initialCapacity)
Constructs a new, empty set; the backing HashMap instance has the specified initial capacity and default load factor (0.75).

And while this is still slightly vague about the concrete meaning of initialCapacity, the parameter description is rather explicit:

initialCapacity – the initial capacity of the hash table

This means that rather then telling you how many elements you can put into your set before it might have to be resized, this parameter specifies the size of the internally used hash table, that given the default load factor of 0.75, will be enlarged as soon as there are more than 0.75*TABLE_SIZE elements in the set (even if some or all of them end up in the same hash bucket). Thus, to correctly initialize a HashSet for 2 elements, one has to call

Set<String> someSet = new HashSet<>(3);

which does the same as

Set<String> someSet = new HashSet<>(4);

since the internal table size is always a power of 2.

I’d recommend using though, that addresses exactly the problem that is being discussed here:

public static HashSet newHashSetWithExpectedSize(int expectedSize)
Returns a new hash set using the smallest initial table size that can hold expectedSize elements without resizing. Note that this is not what HashSet.HashSet(int) does, but it is what most users want and expect it to do.

An even better alternative that leads to more concise code at the same time is to use, E).

Last but not least, it should be mentioned that exactly the same reasoning applies to HashMap that is actually used to implement HashSet internally.

Categories: Java, Programming Tags: ,

Be Careful When Converting Java Arrays to Lists

May 1, 2010 15 comments

Unfortunately not everything that should be trivial actually is. One example is converting Java arrays to lists. Of course, there is Arrays.toList, but using this method carelessly will almost certainly lead to nasty surprises. To see what I mean consider the following program and try to predict its output:

package com.wordpress.mlangc.arrays;

import java.util.Arrays;

public class ArraysToList
    public static void main(final String[] args)
                Arrays.asList(new String[] { "a", "b" }));
                Arrays.asList(new Integer[] { 1, 2 }));
                Arrays.asList(new int[] { 1, 2 }));
                Arrays.asList(new String[] { "a", "b" }, "c"));

As the Javadocs for Arrays.asList are quite vague, I can’t blame you for having some difficulties coming to a conclusion, so here is the answer step by step:

  • Line 9 prints “[a, b]” to our shiny little console which is pretty much what one would expect from a sane API, so we are happy.
  • The same is true for line 12 which results in “[1, 2]”.
  • Line 15 however is different, not only because 15 is not 12, but also because an int is not an Integer, and therefore prints “[[I@39172e08]” to our console, that is not shiny anymore. Instead of a list containing two Integer objects, we got a list containing the array as its sole element.
  • After what we have seen above, it should not take you by surprise that line 18 results in another mess that looks like “[[Ljava.lang.String;@20cf2c80, c]”.

So, what happened? The first two print statements worked as expected, because the Java Language Specification states that calling a method with signature foo(T… t) like foo(new T[] { bar, baz }) is semantically equivalent to foo(bar, baz). In Arrays.asList T is a type parameter, so it has to be an Object, and this is not true for int, but for int[]. Thats why the statement in line 16 is equivalent to

Arrays.asList(new Object[] { new int[] { 1, 2 } })

Last but not least, the statement in line 19 called for trouble from the very beginning. We told the compiler that we want a list containing an array of strings and a string, which is exactly what we got.

So far for the explanation, but there is something else we can learn from that: The real source of confusion is not that varargs feature is badly designed; I would rather say that the opposite is true. The problem is that Arrays.asList violates EffJava2 Item 42, which explains clearly, in fact giving Arrays.asList as a bad example, why you should be quite careful when designing APIs that use Java varargs. I won’t repeat the reasoning of the book here, as you really should read it yourself, but for the sake of completeness I have to point out that the problematic statements from above would have been rejected by the compiler back in the old days of Java 1.4, which was a good thing. We can still use Arrays.asList today, but doing so safely requires us to be aware of the subtillities we are facing. Here are a few rules for converting arrays to lists, that guarantee nothing unexpected happens:

  • If you convert to a list just to convert to a string, use Arrays.toString instead. It does what you want all the time and also works for arrays of primitives.
  • If you want to convert an array of primitives to a list of boxed primitives, take advantage of Apache Commons Lang, which most likely is a dependency of your project already anyway, and use ArrayUtils.toObject like this:

    List<Integer> list = Arrays.asList(ArrayUtils.toObject(new int[] { 1, 2 }));

    Note however that lists of boxed primitives should not generally be preferred over arrays containing primitives.

  • If you want to convert an array of object references, use Arrays.asList directly:

     List<String> list = 
               Arrays.asList(new String[] { "a", "b" });

    Don’t forget to make sure that the people you work with won’t emulate you carelessly.

Of course, you can also choose to just remember that Arrays.asList might behave unexpectedly and use plain for loops instead, but that clutters your code and comes with a performance penalty.

Categories: Programming Tags: , ,

What you don’t want to know about Java boxed primitives

January 9, 2010 1 comment

In my last post I used a simple calculation to argument that a List containing 1024 Integer objects will take at least 8kb on a 32bit system, and at least 12kb on a 64bit system. I then also argued that additional storage for 1024 Java objects is needed (some of them might be cached, but let’s talk about that later). Unfortunately however, I did not investigate about how much that would be until Ortwin Glück, who runs the Java Anti-Patterns page, wrote me an email with some very interesting numbers. So I started doing some research:

How can you reliably obtain the size of a random Java object? Well, there are basically two answers: You can use a hack that is outlined here, or you can use Instrumentation.getObjectSize(Object obj). Still, obtaining an implementation of Instrumentation and transversing object graphs is far more work than I want to invest. Luckily there is java.sizeOf which does exactly that for us. java.sizeOf basically is a Java agent and a library to access the functionality of it, packaged into a single class called SizeOf. To see what it can do, I wrote this small program:

package com.wordpress.mlangc.sizeof;

import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

import net.sourceforge.sizeof.SizeOf;

public class TestSizeOf
    private static final String WTF = ":-O";
    private static final int SIZE = 1024;
    public static void main(final String[] args)
        printSize(new Object());
        printSize(new int[SIZE]);
        List<Integer> intList = new ArrayList<Integer>(SIZE);
        for(int i = 0; i != SIZE; ++i)
    private static void printSize(final Object obj)
        System.out.println("sizeof(" + describeObject(obj) + "): " 
                + SizeOf.humanReadable(SizeOf.deepSizeOf((obj))));
    private static String describeObject(final Object obj)
        Class<?> clazz = obj.getClass();
        StringBuilder sb = new StringBuilder(clazz.getCanonicalName());
                throw new AssertionError(WTF);
            sb.insert(sb.length() - 1, Array.getLength(obj));
        else if(obj instanceof Collection<?>)
            Collection<?> collection = (Collection<?>) obj;
        return sb.toString();
    private static String extractGenericTypeFromCollectionHack(final Collection<?> collection)
        //It is impossible to do this reliably because of type erasure;
        //for our purposes however, this simple hack,
        //that just inspects the first element, should do nicely.
            return "?";
        return collection.iterator().next().getClass().getCanonicalName();

Then I executed the code above like this (the -javaagent option is important):

$ java -javaagent:${LIB_PATH}/SizeOf.jar com.wordpress.mlangc.sizeof.TestSizeOf
JAVAGENT: call premain instrumentation for class SizeOf
sizeof(java.lang.Object): 16.0b
sizeof(java.lang.Integer): 24.0b
sizeof(int[1024]): 4.0234375Kb
sizeof(java.util.ArrayList<java.lang.Integer>[1024]): 32.0625Kb

So, yes a plain and mostly useless Java Object consumes 16bytes on my system (x86_64 with sun-jdk-1.6.0_17) – thats already 4 int values! An Integer object swallows 24bytes (I would have guessed 20 = 16 + 4 – maybe this has something to do with alignment?), which is 6 int values. An array of 1024 int values needs 4kb as it should, but an ArrayList<Integer> swallows 32kb = 24kb + 8kb and therefore consumes unbelievable 8 times more memory than the array.

So what do I suggest considering the disturbing facts outlined above?

  • Prefer primitive types to boxed primitives as suggested by EffJava2 Item 49. Not only are they more efficient, the fact that they cannot be null makes them more attractive for other reasons too (albeit not always).
  • If you use wrapped primitives, prefer autoboxing or the Foo.valueOf(foo f) factory methods to direct constructors calls, as this is (citing the Javadocs) “likely to yield significantly better space and time performance by caching frequently requested values”. This is especially important for Boolean, where autoboxing or valueOf(…) never creates a new object.
  • Be aware of the fact, that Collections containing boxed primitives are huge memory wasters compared to arrays, even when caching is considered, but keep in mind, that premature optimization is the root of all evil.

What you don’t want to know about Java Collections

January 7, 2010 1 comment

In one of my previous posts I wrote (I deleted “very” later, after the post was already published, as it seemed to strong):

Prefer lists to arrays unless you have a very good reason not to.

This was mostly in reaction to Lists are overrated, which has since been edited. Before anything else, l want to state clearly that I still stick to that statement. Unfortunately, due deficiencies in the Java language design and standard library APIs, some of these “good reasons” are not obvious:

Lists containing lots of boxed primitives are huge memory wasters: This isn’t really surprising, if you just do the math. Imagine you have 1024 integers. If you store them in an int array, you should be able to get away with about 4kb of memory. But storing them in a list of boxed primitives will take at least 8kb on a 32bit system, and at least 12kb on a 64bit system, because we now also need an additional object reference for every int value we store. Last but not least, we also need an implementation dependent amount of storage for 1024 Java objects, that make matters even worse. Note that this reasoning isn’t limited to Collections, but to boxed primitives in general (see also Item 49 “Prefer primitive types to boxed primitives” in EffJava2).

Collections.sort(…) favors idiots over experts: Let me cite the relevant Javadocs:

This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.

So, yes, the implementors of the Java standard library favor programmers who are clueless enough to sort huge linked lists, over programmers that did their homework. In fact, this simple Qicksort implementation, that sorts the given List in place, outperforms Collections.sort(…) by about 20% for ArrayList<Integer> and doesn’t needlessly allocate possibly huge amounts of memory:

static <T extends Comparable<? super T>> void qsort(final List<T> list)
    qsort(list, 0, list.size());
private static <T extends Comparable<? super T>> void qsort(final List<T> list, final int start, final int end)
    // Handle simple cases:
    if(end - start < 2)
    if(end - start == 2)
        if(list.get(start).compareTo(list.get(end - 1)) > 0)
            swap(list, start, end - 1);
   // Select a new pivot:
   int i1 = start;
   int i2 =  start + (end - start)/2;
   int i3 = end - 1;
   T p1 = list.get(i1);
   T p2 = list.get(i2);
   T p3 = list.get(i3);
   int pivotInd = -1;
   if(p1.compareTo(p2) == -p1.compareTo(p3))
       pivotInd = i1;
   else if(p2.compareTo(p1) == -p2.compareTo(p3))
       pivotInd = i2;
       pivotInd = i3;
   T pivot = list.get(pivotInd);
   // Partition the list using the selected pivot:
   swap(list, pivotInd, end - 1);
   int storeInd = start;
   for(int i = start; i != end - 1; ++i)
       if(pivot.compareTo(list.get(i)) > 0)
           swap(list, i, storeInd++);
    swap(list, storeInd, end - 1);
    // Recurse:
    qsort(list, start, storeInd);
    qsort(list, storeInd + 1, end);

private static <T> void swap(final List<T> list, final int i1, final int i2)
    T tmp = list.get(i1);
    list.set(i1, list.get(i2));
    list.set(i2, tmp);

If you don’t believe me, try it yourself – but please don’t use this code in production without testing it thoroughly.

Categories: Programming Tags: ,