Friday, July 31, 2009

@BatchSize() Annotation in hibernate

While searching on the net on how to gain perfomance in JPA/Hibernate i stumbled upon an article written on hibernate annotation batchsize and it intrigued me on how it really works. After reading on and going through the algorythm defined by Hibernated on how it works i understood that this adds a sufficient perfomance gain in the event of you having large collections within your entities.

 --------Updated on 07/30/2012 according to the clarification given by Jeremy---------

For example lets consider an Airport parent containing a collection of airlines as such;

    @OneToMany()
    List<Airlines> airlines = new ArrayList<Airlines>();
Imagine you had such a statement in one of your entities. Assume there can be 100 Airport objects at any given time.Without the BatchSize annotation, Hibernate will first retrieve the Airports separately and for each airport it will retrieve the airlines separately. Now consider the following code snippet;
    @OneToMany
    @BatchSize(size=16)
   List<Airlines> airlines = new ArrayList<Airlines>();
When you use the batch size annotation what hibernate does is it divides the number of elements that would come in the resulting query by the batch size defined. In this case its 100 / 16 so you get 6 and remainder 4. What this implies is hibernate will go to the database 6 times to fetch 16 Airport objects' with their Airlines collection initialized and then go again another time to retrieve the remaining 4 Airports again with their Airlines collection initialized. So what the @BatchSize does is decide how many collections should be initialized.

3 comments:

  1. I think you've totally misunderstood this one. BatchSize doesn't affect how many attached objects are retrieved. It affects how many uninitialised collection associations are fetched in one query.

    Regardless of @BatchSize setting, the example you've got will always retrieve all (100) Airlines. @BatchSize controls how many lists of airlines will be retrieved. Assuming the entity containing Airlines is called Airport. If you have @BatchSize=4, and have 8 airports loaded without their associations instantiated, when you try and instantiate the airlines list in one airport, Hibernate will also load up airlines for 3 other airports at the same time, this *saving* 3 trips to the database.

    The generated SQL query goes from "select ... from airlines where airport_id = ?" to "select ... from airlines where airpot_id in (?, ?, ?, ?)"

    ReplyDelete
  2. Hi Jeremy,

    That is an interesting point you have brought forward. Never thought of the functionality in that way. But i have a query. Would really appreciate if you can clarify it to me;

    Pls check the following link;

    http://www.mkyong.com/hibernate/hibernate-fetching-strategies-examples/

    According to him, in his example, the collection is filled with a maximum number of 3 queries hitting the DB because of the @BatchSize annotation.

    In the time i blogged this example, i did a similar test and the DB calls to go and fetch the collection was infact reduced due to the annotation.

    What your saying is that when loading a collection of one instance, hibernate will go and fetch the collection of three other instances right? But isnt this unnecessary calls if i were loading the instance based on its primary key?

    ReplyDelete
  3. Hi Again Jeremy,

    Looks like you were correct. Got it clarified through this link;

    http://www.jroller.com/eyallupu/entry/tuning_queries_using_paging_batch

    Thank you very much for bringing this forward. I will amend the post with the changes.

    Cheers

    Dinuka

    ReplyDelete