database

Image by mcmurryjulie↗

Disclaimer: JPA is a really helpful technology when accessing relational databases with java. I don’t want to go back to the times when all we had was the simple JDBC API and we all used to write handcrafted value mappings from query results to java objects.

But there are three anti-patterns, that I see again and again when using JPA. People often don’t understand the underlying technology and suffer from the leaking abstractions in JPA.

The solution to most of the problems is thinking: “How would I do that using just plain SQL?”

1. Eager vs. Lazy loading

Thinking about a simple database with just 3 tables:

Table User with columns: “ID” and “name”

Table Adress with columns: “ID”, “user_id” (foreign key to table user), street, zip, city.

Table Change_Log with columns: “ID”, “user_id” (foreign key to table user), “changes”, “timestamp”

One user can have several addresses. One user usually has many change_log records.

Often you find something like this in the java class:

class User {
  
  ...
  
  @JoinColumn("user_id")
  @OneToMany
  List<Address> address; 
  
  @JoinColumn("user_id")
  @OneToMany
  List<ChangeLog> changeLog; 
  
  ...
  
}

For the @OneToMany annotation you should really carefully think about the fetch type.

Fetch type eager

Every time your software tries to access a User, it will load the address/changeLog as well. Speaking SQL:

Instead of SELECT * FROM User it will do a SELECT * from USER JOIN Address ON ... JOIN Change_Log ON ... .

Maybe for the address this might be a good idea: I assume most of our users only have one address and we will need to display it together with the user.

Maybe for the change log that might be a bad idea: One user has many changes and you don’t need them most of the time.

Fetch type lazy

Every time your software tries to access a User, it just load the user (“SELECT * FROM User”). No join.

But if you want to get the address as well (user.getAddress()), it will need to go back the database and fetch the address.

Maybe an other colleague will continue programming on that object and he thinks: “It’s just a getter, why does it take so much time to execute?”

No, it’s not just a getter. It’s a really complex thing behind it.

Now you might feel the my pain when I discover loops in loops in loops of joined entities….

Maybe it might be better to create a new query method in the repository or even a new repository.

Or you even want to think about using document oriented databases like CouchDB↗, MongoDB↗ or DynamoDB↗.

2. Loop vs. Join

Maybe on one day you need to clean up the changeLog that is older than one year.

Sometimes I see things like:


for (ChangeLog changeLog: user.getChangeLog()) {
  if (changeLog.getTimestamp().isLessThan(oneYearAgo)) {
    changeLog.delete();
  }
}

What do you think about the execution time?

If you want to delete 300 out of 1000 change log records you have how many database turnarounds?

How would JPA translate that to SQL?

1 time: SELECT * from adress where user_id =...

Now you’ve got 1000 records in RAM (spoiler: for nothing)!

300 times: DELETE from address where ID = ...

How would you do that in SQL?


DELETE FROM address where USER_ID = ... AND TIMESTAMP < ... 

One and only one SQL statement!

If you think about modern microservices and performance it’s not about quicksort vs. bubblesort. It’s just about the numbers of remote calls. And having just one remote call or 300 remote calls, that’s quite a difference.

Please don’t loop over joined entities and change data in java. Use plain SQL!

3. Forgetting about transaction boundaries

Sometimes I see stacktraces of SQLIntegrityConstraintViolationExceptions rising in some place in some framework. I try to find which SQL statement is responsible from the source of the stack trace, but that’s impossible.

Why?

Usually your database does not write data, when you execute an “UPDATE” statement. It only writes the data as you “COMMIT” your transaction. If you forget about transactions, then JPA/your application server will just care for you and try to commit everything you’ve written. But if you mess up the integrity of the data, you will almost never sort out the reason for the failing update.

So, please:

Care about transactions! Annotate the right method with @Transactional and then catch and log the sql exceptions with a useful error message!

Thank you!

*) There are some rare cases, when the database is under heavy load and it has troubles to handle table locking between different processes. You might already have experienced some deadlock situations. Then it might be helpful if you just update/delete and commit just row by row instead of updating/deleting one huge set of data with one statement. But usually that’s not the case.

Additional Resources:

Update 2022-09-03: added resources links.

Update 2022-12-29: added resource link.

Update 2023-01-20: added resource link.

Any comments or suggestions? Leave an issue or a pull request!