By Yakov Fain

I was experimenting with Java HashSet, which is a pretty expensive collection to create, but it has a benefit of the O(1) performance while finding the elements from this collection. Based on my experiments performance of HashSet is improved over the last year.

I’ve written a small benchmark comparing the performance of the one year old JRE 1.8.0_05 with the latest 1.8.0_60. In my tests I’m creating a HashSet containing 100000 objects. The object looks like this:

import java.math.BigDecimal;
 
public class MyObject {
 
    public String s1 = "aaaaaaaaaaaaa";
    public Double d1 = 222222222222.22;
    public BigDecimal b1 = new BigDecimal(1.54);
    public int i1;
     
    public String s2 = "aaaaaaaaaaaaa&amp";
    public Double d2 = 222222222222.22;
    public BigDecimal b2 = new BigDecimal(1.54);
    public int i2;
     
    public String s3 = "aaaaaaaaaaaaa";
    public Double d3 = 222222222222.22;
    public BigDecimal b3 = new BigDecimal(1.54);
    public int i3;  
}

My test program runs two test loops. First, I preallocate the memory for the HashSet capable of storing 133000 object with the load factor 0.75. In the second loop I create the HashSet with the default initial size 16 and the same load factor. The load factor 0.75 causes the HashSet to grow if the number of elements becomes greater than 75% of the initial capacity. I was expecting the first loop to perform a little better because there is no need to do additional memory allocations. Here’s the test program:

import java.time.Duration;
import java.time.LocalTime;
import java.util.HashSet;
 
public class Main {
 
    static int iterations = 100000;
    static float loadFactor=0.75f;
    static int initialSize = (int)Math.round(iterations/loadFactor);
     
    public static void main(String[] args) {
        
       int nTests = 10;
       long totalTime = 0; 
        
       // HashSet with large initial size
       for (int i=0; i<nTests; i++){
        totalTime += populateHashSet(initialSize, loadFactor);                  
       }
        
       System.out.println("With JRE " + System.getProperty("java.version") + " the average time (in milis) to populate the HashSet of initial size " + initialSize + " is "+ totalTime/nTests);
       
        
       // HashSet with default initial size
       initialSize = 16;  // default for HandSet
       totalTime = 0; 
        
       for (int i=0; i < nTests; i++){
        totalTime += populateHashSet(initialSize, loadFactor);                  
       }
        
       System.out.println("With JRE &amp;quot; + System.getProperty("java.version") + " the average time (in milis) to populate the HashSet of initial size " + initialSize + " is "+ totalTime/nTests);
    }
     
 
    static long populateHashSet(int size, float loadFactor){
      
        System.gc();
         
        HashSet&amp<MyObject> hashSet = new HashSet<>(initialSize, loadFactor);
         
            LocalTime before = LocalTime.now(); 
             
     
            for (int i =0; i < iterations; i++){
                MyObject obj = new MyObject();
                obj.i1 = i*2; 
                hashSet.add(obj);
            }
         
        LocalTime after = LocalTime.now();
         
       return Duration.between(before, after).toMillis();       
                 
    }
}

Running this program in two different JREs shows the following results:

With JRE 1.8.0_05 the average time (in milis) to populate the HashSet of initial size 133333 is 167
With JRE 1.8.0_05 the average time (in milis) to populate the HashSet of initial size 16 is 152

With JRE 1.8.0_60 the average time (in milis) to populate the HashSet of initial size 133333 is 153
With JRE 1.8.0_60 the average time (in milis) to populate the HashSet of initial size 16 is 141

My test shows that the HashSet in JRE 1.8.0_60 gets populated about 10% faster than with JRE 1.8.0_05. If your application works with large HashSets you should upgrade your JRE.

What I can’t explain is why the numbers with default initial size are better than with the HashSet with pre-allocated memory.

 

Java HashSet Becomes a Little Faster

About The Author
-

2 Comments

  • Michael Goossens
    Reply

    I agree. You cant conclude anything from this test. Too small, system load at the time of test has a lot of influence to the result, scope is too small and the eventual difference is neglectable (an OS task could turn the result around).

    The system.gc is also unreliable, don’t use it, the only thing it can do is make your test result even more unreliable.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>