Bug tracking: fixing hibernate heap problems

In the modern world, data is the most valuable asset, and it is not a surprise that no matter what application is, data handling and processing is part of its core functionality. By having several ORM (Object/Relational Mapping) frameworks, like Hibernate or Eclipselink java offers a real help for developers in the process of accessing application data, by creating an abstract data access layer, which is in charge of retrieving and saving the application data to the database.

“Application developer should not care how data is saved or retrieved, as long as it is saved or retrieved“

Or should we care?

Recently, we had a real situation, where an application backend after several high heap memory allocations just crashed (as shown in the picture), there were more than 2G of memory allocations in a very short amount of time.

Using VisualVM we were able to follow in real time the memory allocation and see the object allocations made by the application.

Forcing garbage collection was a temporary solution, the heap was released, however after a short period of time, the memory allocation started to increase again.

The fact that the garbage collector was able to release the memory, was a hint that we are not dealing with a memory leak, but something else must be generating the high memory usage.

In the application, we use everywhere in the relations FetchType lazy and pagination, to avoid unnecessary database load for retrieving data that is not needed.

VisualVM allowed us to do a memory dump, right after such a big heap memory allocation:

One of our domain classes, Task, had several instances on the heap. Despite having pagination and lazy fetching enabled, somehow hibernate managed to load all of these elements in memory. 8.3 million records. This was the cause for a steep increase of memory usage from time to time.

Time to check the code

In the domain definition, we have 2 classes defined, one is the Task, and the other is User. There is also @OneToMany relation between User and Task, and @ManyToOne between Task and User.

/* Relation between user and task */
@Transient private int tasksNo;

@OneToMany(mappedBy = "author")
@JsonIgnore private List tasks; 

/* Relation between Task and User */ 
@ManyToOne(fetch = FetchType.LAZY) 
@JoinColumn(name = "author_id") 
@JsonIgnore private User author;

On the domain level definition, everything seems to be fine.

At the business level, the backend is handling several REST API calls related to the User and Task. In some of the API responses, we need to have the total number of tasks assigned to the User. As we are returning JSON response, we have a Transient variable, taskNo, which should contain the number of tasks, and it has a getter defined, to count the number of tasks

public int getTasksNo() {
  if (tasks != null) {
    return tasks.size();
  }
  return 0;
}

Now, this is a problem. Even if we have lazy data fetching for tasks, hibernate needs to retrieve all the records to the collection, in order to be able to count its size. Having a user with 8.3 million tasks ends up retrieving 8.3 million objects from the database, and in the end, memory problems.

Use the right tool for the job

Obviously this is not the right way of retrieving collection size. The framework is offering a solution to solve this problem. In hibernate it is possible to attach a formula to a field:

@Formula("(select count(task.id) from task where task.author_id = id)")
private int tasksNo;

This way, hibernate will count the records using the provided query, avoiding unnecessary data retrieval.

Hibernate tips:

use lazy data fetching where it is possible
use pagination to limit the number of records retrieved
make sure that your code is not forcing hibernate to retrieve entire collection
prefer framework offered solutions instead of code them in your application

Leave a Reply Cancel reply