Saturday, March 29, 2014

What happened to synchronized in Java 8?

I was curious about the compared performance of AtomicXXX classes vs. using Locks or synchronized methods under high contention in a multi-threaded context. I then came up with a micro-benchmark which performs this comparison with various numbers of concurrent threads.

What I am testing are implementations of the following interface:

public interface LongCounter {
  long incrementAndGet();
  long decrementAndGet();
  long addAndGet(long value);
  long get();
  void set(long value);
}

Implementations use the following patterns:

with AtomicLong:

public class LongCounterAtomic implements LongCounter {
  private final AtomicLong value = new AtomicLong(0L);
  @Override public long incrementAndGet() {
    return value.incrementAndGet();
  }
  ... other methods ...
}

with ReentrantLock:

public class LongCounterLock implements LongCounter {
  private long value = 0L;
  private Lock lock = new ReentrantLock();
  @Override public long incrementAndGet() {
    lock.lock();
    try {
      return ++value;
    } finally {
      lock.unlock();
    }
  }
  ... other methods ...
}

with synchronized methods:

public class LongCounterSynchronized implements LongCounter {
  private long value = 0L;
  @Override public synchronized long incrementAndGet() {
    return ++value;
  }
  ... other methods ...
}

The test then performs various invocations of the interface methods on a single instance of one of the implementations, which is shared by all the threads in the test. Thie is done for a specified number of iterations. The number of threads varies in a range from 1 to 256, and the number of iterations is fixed, which means the more threads, the less iterations each thread has to perform. The parameters for a test are thus: the LongCounter implementation instance, the total number of iterations and the number of concurrent threads.

The benchmark measures the elapsed time for each test, as well as the average cpu load during the test. Tetsing with Java 7 on an Intel 17-2600 (4 cores, 8 hardware threads), gives the following results:

So far, so good, it is interesting to see how the Atomic implementation is lagging well behind the others. The higher cpu load is to be expected since AtomicLong uses lock-free algorithms for thread-safe access to the value.

Now, what really threw me off were the results I obtained when I ran the exact same code with Java 8 (and recompiled it to the Java 8 class format):

Wow! What happened to the synchronized stuff? It's now 5 times slower than in Java 7. To be fair, we can notice that the performance of AtomicLong has highly improved, as it is twice faster. This is quite puzzling, and I'm wondering what's going on. I'll appreciate if any feedack on this could be provided. Maybe I'm not testing the right way, or something has dramatically changed in how the JVM handles synchronized methods and blocks.

For reference, I have posted the source code here and the full benchmark results here (as a PDF).

I will be grateful for any input the community can provide, thanks a lot!