Adopting to using Spring Data JPA these day, there is a post saying: IDENTITY generator disables JDBC batch inserts. To figure out the impact, create a table with 10 data fields and an auto-increment id for testing. I am using MySQL 5.7.20 / MariaDB 10.3.3 / Spring Data JPA 1.11.8 / Hibernate 5.0.12.

And generate the persistence entity, add @GeneratedValue annotation:

My benchmark runs to batch insert 2000 records in 1/2/4/8/16/32 concurrent threads.


When using GenerationType.IDENTITY, result looks like:

As mentioned, Hibernate/JPA disables batch insert when using IDENTITY. Look into org.hibernate.event.internal.AbstractSaveEventListener#saveWithGeneratedId() for details. To make it clear, it DOES run faster when insert multiple entities in one transaction than in separated transactions. It saves transaction overhead, not round-trip overhead.

The generated key is eventually retrieved from java.sql.Statement#getGeneratedKeys(). And datasource-proxy is used to display the underlining SQL generated.


Now switch to GenerationType.TABLE. Just uncomment the corresponding @GeneratedValue and @TableGenerator annotation. Result looks like:

To fix Hibernate deprecation warning and get better performance, add the line to

I began to think that was the whole story for batch, and the datasource-proxy interceptor also traced down the batch SQL. But after I looked into dumped TCP packages using wireshark, I found the final SQL was still not in batch format. Say, they were in:

Instead of:

The latter one saves client/server round-trips and is recommended by MySQL. After adding rewriteBatchedStatements=true to my connection string, MySQL generated batch statements and result was much improved:


Last switch to GenerationType.SEQUENCE. Sequence is a new feature added in MariaDB 10.3 series. Create a sequence in MariaDB with:

Generally, the increment should match the one specified in @SequenceGenerator, at least >= allocationSize. See

Hibernate apparently does not support the new feature, I dealt with it by adding a new dialect:

And add configuration:

supportsSequences() adds the sequence support. supportsPooledSequences() adds some pool-like optimization both supported by MariaDB and Hibernate. Otherwise, Hibernate uses tables to mimic sequences. Refer to Result with and without batch:

Dramatically improved when compared to the table generator. A sequence generator uses cache in memory(default 1000), and is optimized to eliminate lock when generating IDs.

4. Summary

1 thread 2 threads 4 threads 8 threads 16 threads 32 threads
IDENTITY 823 609 1188 2329 4577 9579
TABLE 830 854 1775 3479 6542 13768
TABLE with batch 433 409 708 1566 2926 6388
SEQUENCE 723 615 1147 2195 4687 9312
SEQUENCE with batch 298 155 186 356 695 1545

From the summary table, IDENTITY is simplest. TABLE is a compromise to support batch insert. And SEQUENCE yields the best performance. Find the entire project in Github.

While reading Understanding the JVM – Advanced Features and Best Practices, Second Edition (in Chinese) recently, there is a guide in chapter one to build JVM from source. It is based on OpenJDK7, which only works when using a Java6/Java7 VM as build bootstrap. Java8 bootstrap has more strict code checks and will finally fail the build. So, I just switched to a more recent OpenJDK8 code. The file name is

The code provides a better build experience, and compiles on my Linux box almost out of box. But remember, do not use a gcc compiler >= gcc-6. It defaults to C++14 and breaks the build. On macOS, the build scripts seem only support gcc. Actually, a clang compiler is required to build the objc code.

1. So the first step after downloading and unzipping the code, modify the configure script:

Comment out the lines(2 appearances):

2. Now install freetype and run configure:

A slowdebug build disables optimization and helps a lot when debugging. The summary output looks like:

3. Apply the following patch to fix build errors. Partially picked from an official OpenJDK10 changeset:

4. Start the build:

Lots of warnings, but the build should finish successfully:

5. Debugging with lldb:

The output binary lie in openjdk/build/macosx-x86_64-normal-server-slowdebug/jdk. The output of java -version:

Never used lldb before, seems to be compatible with gdb:

The article is inspired by the posts here and here.

There is a RESTful service as the infrastructure for data access in our team. It is based on Jersey/JAX-RS and runs fast. However, it consumes large memory when constructing large data set as response. Since it builds the entire response in memory before sending it.

As suggested in the above posts. Streaming is the solution. They integrated Hibernate or Spring Data for easy adoption. But I need a general purpose RESTful service, say, I do not know the schema of a table. So I decided to implement it myself using raw JDBC interface.

My class is so-called MysqlStreamTemplate:

  • It does not extend JdbcTemplate, since there is only one interface for streaming, not one series. I’m not writing a general purpose library.
  • It is MySQL only, I have no time to verify with other relation databases.
  • It does accept a DataSource as the parameter of the its constructor.
  • Staff like Hibernate session is not concerned, since it maintains Statement & Connection by itself.
  • Staff like @Transcational is not concerned, since we do not care about transactions. Actually, MySQL gives HOLD_CURSORS_OVER_COMMIT in StatementImpl#getResultSetHoldability() in its JDBC driver, saying that our ResultSet survives after commit.

So, here is my class. NOTE: closing our Statement & Connection requires explicit invoke of Stream#close():

Read inline comments for additional details. Now the response entry and controller mapping:

Complete code can be find on my GitHub repository.

My simple benchmark script looks like:

Dramatic improvements in memory usage as shown in jconsole, especially Old Gen:

Some raw data from jmap:

  • Jersey
  • Spring Boot
  • Spring Boot with Streams

Well, new to the big data world.

Following Appendix A in the book Hadoop: The Definitive Guide, 4th Ed, just get it to work. I’m running Ubuntu 14.04.

1. Download and unpack the hadoop package, and set environment variables in your ~/.bashrc.

Verify with:

The 2.5.2 distribution package is build in 64bit for *.so files, use the 2.4.1 package if you want 32bit ones.

2. Edit config files in $HADOOP_HOME/etc/hadoop:

3. Config SSH:
Hadoop needs to start daemons on hosts of a cluster via SSH connection. A public key is generated to avoid password input.

Verify with:

4. Format HDFS filesystem:

5. Start HDFS:

Verify running with jps command:

6. Some tests:

7. Stop HDFS:

8. If there is an error like:

Just edit $HADOOP_HOME/etc/hadoop/ and export JAVA_HOME explicitly again here. It does happen under Debian. Not knowing why the environment variable is not passed over SSH.

9. You can also set HADOOP_CONF_DIR to use a separate config directory for convenience. But make sure you have the whole directory copied from the Hadoop package. Otherwise, nasty errors may occur.