Managing Multiple Databases with Spring and Hibernate
We recently had the need to use a couple different databases in the same web app. These are completely separate databases, not shards with matching schemas. What I thought would be a simple task turned into a really annoying day. There's not a ton of (working) documentation around, so I thought I'd take a few minutes to explain how we pulled this off. Just because I care.
Quick disclaimer though -- we haven't used this in production yet, but it seems to be working fine during development and load testing. I welcome any suggestions from people who have done this differently. As for versions, we're using Spring 2.5.6, Hibernate 3.3.1 and Atomikos 3.5.8. You can download the latest version of Atomikos Transaction Essentials from their site, or use their maven repository at http://repo.atomikos.com.
The basic idea here is pretty simple. You define multiple datasources, wire them into the correct DAOs, and then use whatever DAOs are necessary in your service layer. The thing that makes this difficult is transaction management. In order to do this in a transactional manner, we discovered that we had to use JTA. We're not using an application server that supports JTA, so that meant looking into a standalone JTA solution. There are a couple that pop up when searching for this, JTOM and Atomikos. Out of the two, we chose Atomikos for various scientific reasons which I won't bore you with (we flipped a coin), so that's what I'll be focusing on here.
Once we started down the JTA path, we realized that we also needed a XA datasource, so there were a lot of changes compared to our previously basic set-up of datasource + transaction manager. So let's get down to business and take a look at some of that pretty and succinct Spring configuration we've all come to know and love. We created a Spring application context file for our persistence layer to house all of this loveliness.
First, let's look at the datasource configs:
<bean id="dataSourceOne" >
<property name="uniqueResourceName"><value>dataSourceOne</value></property>
<property name="xaDataSourceClassName"><value>com.mysql.jdbc.jdbc2.optional.MysqlXADataSource</value></property>
<property name="xaProperties">
<props>
<prop key="user">${ds1.jdbc.username}</prop>
<prop key="password">${ds1.jdbc.password}</prop>
<prop key="url">${ds1.jdbc.url}</prop>
<prop key="allowMultiQueries">true</prop>
<prop key="pinGlobalTxToPhysicalConnection">true</prop>
<prop key="autoReconnect">true</prop>
<prop key="autoReconnectForConnectionPools">true</prop>
<prop key="autoReconnectForPools">true</prop>
</props>
</property>
<property name="maxPoolSize" value="${ds1.pool.max.size}" />
<property name="minPoolSize" value="${ds1.pool.min.size}" />
<property name="maxIdleTime" value="${ds1.pool.max.idle.time}" />
<property name="testQuery" value="${ds1.pool.preferred.test.query}" />
<property name="reapTimeout" value="0" />
</bean>
<bean id="dataSourceTwo" >
<property name="uniqueResourceName"><value>dataSourceTwo</value></property>
<property name="xaDataSourceClassName"><value>com.mysql.jdbc.jdbc2.optional.MysqlXADataSource</value></property>
<property name="xaProperties">
<props>
<prop key="user">${ds2.jdbc.username}</prop>
<prop key="password">${ds2.jdbc.password}</prop>
<prop key="url">${ds2.jdbc.url}</prop>
<prop key="allowMultiQueries">true</prop>
<prop key="pinGlobalTxToPhysicalConnection">true</prop>
<prop key="autoReconnect">true</prop>
<prop key="autoReconnectForConnectionPools">true</prop>
<prop key="autoReconnectForPools">true</prop>
</props>
</property>
<property name="maxPoolSize" value="${ds2.pool.max.size}" />
<property name="minPoolSize" value="${ds2.pool.min.size}" />
<property name="maxIdleTime" value="${ds2.pool.max.idle.time}" />
<property name="testQuery" value="${ds2.pool.preferred.test.query}" />
<property name="reapTimeout" value="0" />
</bean>
As you see, we have two nearly identical datasources, each with a different bean id, uniqueResourceName, and property placeholder names. The uniqueResourceName property is important in this case, as you'll need a unique name for all of the datasources that are going to be a part of this JTA transaction manager. The property placeholders are filled in the usual way, using Spring's PropertyPlaceholderConfigurer.
Also, if you look at the properties of the AtomikosDataSourceBean, you'll notice that there is another property called poolSize. We did a bit of testing with this and discovered that it's used to set a fixed sized pool, and when set the min/max pool size settings will be ignored. Unfortunately with Atomikos this doesn't seem to be documented anywhere, so it's all trial and error.
For each JDBC datasource, we need a matching hibernate SessionFactory, like so:
<bean id="sessionFactoryOne" >
<property name="dataSource">
<ref bean="dataSourceOne" />
</property>
<property name="annotatedClasses">
<list>
...
</list>
</property>
<property name="hibernateProperties">
<props>
...
<!-- atomikos -->
<prop key="hibernate.current_session_context_class">jta</prop>
<prop key="hibernate.transaction.factory_class">
com.atomikos.icatch.jta.hibernate3.AtomikosJTATransactionFactory
</prop>
<prop key="hibernate.transaction.manager_lookup_class">
com.atomikos.icatch.jta.hibernate3.TransactionManagerLookup
</prop>
</props>
</property>
...
</bean>
<bean id="sessionFactoryTwo" >
<property name="dataSource">
<ref bean="dataSourceTwo" />
</property>
<property name="annotatedClasses">
<list>
...
</list>
</property>
<property name="hibernateProperties">
<props>
...
<!-- atomikos -->
<prop key="hibernate.current_session_context_class">jta</prop>
<prop key="hibernate.transaction.factory_class">
com.atomikos.icatch.jta.hibernate3.AtomikosJTATransactionFactory
</prop>
<prop key="hibernate.transaction.manager_lookup_class">
com.atomikos.icatch.jta.hibernate3.TransactionManagerLookup
</prop>
</props>
</property>
...
</bean>
Note the the only thing that changes across those configs is the bean id and the dataSource property, which should refer to the correct datasource bean.
Now, we create HibernateTemplates for each SessionFactory.
<bean id="hibernateTemplateOne" >
<property name="sessionFactory" ref="sessionFactoryOne"/>
</bean>
<bean id="hibernateTemplateTwo" >
<property name="sessionFactory" ref="sessionFactoryTwo"/>
</bean>
Again, the only thing that changes across those two is the bean id and the sessionFactory property.
And now for the Atomikos transaction management. This is pretty much straight from their documentation. Note that transactionTimeout is in seconds here, not milliseconds.
<bean id="AtomikosTransactionManager" init-method="init" destroy-method="close">
<property name="forceShutdown" value="true" />
<!-- this prop is in seconds -->
<property name="transactionTimeout" value="300"/>
</bean>
<bean id="AtomikosUserTransaction" >
<!-- this prop is in seconds -->
<property name="transactionTimeout" value="300" />
</bean>
<bean id="transactionManager" >
<property name="transactionManager" ref="AtomikosTransactionManager" />
<property name="userTransaction" ref="AtomikosUserTransaction" />
<property name="transactionSynchronizationName" value="SYNCHRONIZATION_ON_ACTUAL_TRANSACTION" />
</bean>
This seems to work with all the flavors of Spring's transaction management, but we use the handy annotations, so our context file also specifies:
<tx:annotation-driven />
Finally, by default (at least in our version), Atomikos expects a conf file for some other settings. It is called jta.properties, and it needs to be at the root of your classpath. Ours looks like this:
## Props for Atomikos JTA
com.atomikos.icatch.service = com.atomikos.icatch.standalone.UserTransactionServiceFactory
# max active transactions
com.atomikos.icatch.max_actives = 100
# tx logs
com.atomikos.icatch.log_base_dir = /var/log/tx/
# output logs
com.atomikos.icatch.output_dir = /var/log/
com.atomikos.icatch.console_log_level = WARN
com.atomikos.icatch.automatic_resource_registration=true
com.atomikos.icatch.serial_jta_transactions=false
These are described in the Atomikos docs, but max_actives is the max number of open transactions allowed, so you may want to tweak this value. Also, the log_base_dir is important as it is where Atomikos writes transactions in case of system failure. Their site mentions setting that to a reliable SAN or RAID storage, so you can read more about that if you'd like.
And that's about it. We have an AbstractDAO for each of our HibernateTemplates, and we just wire the correct one in. You can either Autowire by name, or by manually defining those beans in your Spring config and injecting the necessary bean.
I hope that helps anybody looking to do this, and I'll be sure to write another post if we need to tune/tweak/run away from using this once we pound on it a bit more.
We recently had the need to use a couple different databases in the same web app. These are completely separate databases, not shards with matching schemas. What I thought would be a simple task turned into a really annoying day. There's not a ton of (working) documentation around, so I thought I'd take a few minutes to explain how we pulled this off. Just because I care.
Quick disclaimer though -- we haven't used this in production yet, but it seems to be working fine during development and load testing. I welcome any suggestions from people who have done this differently. As for versions, we're using Spring 2.5.6, Hibernate 3.3.1 and Atomikos 3.5.8. You can download the latest version of Atomikos Transaction Essentials from their site, or use their maven repository at http://repo.atomikos.com.
The basic idea here is pretty simple. You define multiple datasources, wire them into the correct DAOs, and then use whatever DAOs are necessary in your service layer. The thing that makes this difficult is transaction management. In order to do this in a transactional manner, we discovered that we had to use JTA. We're not using an application server that supports JTA, so that meant looking into a standalone JTA solution. There are a couple that pop up when searching for this, JTOM and Atomikos. Out of the two, we chose Atomikos for various scientific reasons which I won't bore you with (we flipped a coin), so that's what I'll be focusing on here.
Once we started down the JTA path, we realized that we also needed a XA datasource, so there were a lot of changes compared to our previously basic set-up of datasource + transaction manager. So let's get down to business and take a look at some of that pretty and succinct Spring configuration we've all come to know and love. We created a Spring application context file for our persistence layer to house all of this loveliness.
First, let's look at the datasource configs:
<bean id="dataSourceOne" >
<property name="uniqueResourceName"><value>dataSourceOne</value></property>
<property name="xaDataSourceClassName"><value>com.mysql.jdbc.jdbc2.optional.MysqlXADataSource</value></property>
<property name="xaProperties">
<props>
<prop key="user">${ds1.jdbc.username}</prop>
<prop key="password">${ds1.jdbc.password}</prop>
<prop key="url">${ds1.jdbc.url}</prop>
<prop key="allowMultiQueries">true</prop>
<prop key="pinGlobalTxToPhysicalConnection">true</prop>
<prop key="autoReconnect">true</prop>
<prop key="autoReconnectForConnectionPools">true</prop>
<prop key="autoReconnectForPools">true</prop>
</props>
</property>
<property name="maxPoolSize" value="${ds1.pool.max.size}" />
<property name="minPoolSize" value="${ds1.pool.min.size}" />
<property name="maxIdleTime" value="${ds1.pool.max.idle.time}" />
<property name="testQuery" value="${ds1.pool.preferred.test.query}" />
<property name="reapTimeout" value="0" />
</bean>
<bean id="dataSourceTwo" >
<property name="uniqueResourceName"><value>dataSourceTwo</value></property>
<property name="xaDataSourceClassName"><value>com.mysql.jdbc.jdbc2.optional.MysqlXADataSource</value></property>
<property name="xaProperties">
<props>
<prop key="user">${ds2.jdbc.username}</prop>
<prop key="password">${ds2.jdbc.password}</prop>
<prop key="url">${ds2.jdbc.url}</prop>
<prop key="allowMultiQueries">true</prop>
<prop key="pinGlobalTxToPhysicalConnection">true</prop>
<prop key="autoReconnect">true</prop>
<prop key="autoReconnectForConnectionPools">true</prop>
<prop key="autoReconnectForPools">true</prop>
</props>
</property>
<property name="maxPoolSize" value="${ds2.pool.max.size}" />
<property name="minPoolSize" value="${ds2.pool.min.size}" />
<property name="maxIdleTime" value="${ds2.pool.max.idle.time}" />
<property name="testQuery" value="${ds2.pool.preferred.test.query}" />
<property name="reapTimeout" value="0" />
</bean>
As you see, we have two nearly identical datasources, each with a different bean id, uniqueResourceName, and property placeholder names. The uniqueResourceName property is important in this case, as you'll need a unique name for all of the datasources that are going to be a part of this JTA transaction manager. The property placeholders are filled in the usual way, using Spring's PropertyPlaceholderConfigurer.
Also, if you look at the properties of the AtomikosDataSourceBean, you'll notice that there is another property called poolSize. We did a bit of testing with this and discovered that it's used to set a fixed sized pool, and when set the min/max pool size settings will be ignored. Unfortunately with Atomikos this doesn't seem to be documented anywhere, so it's all trial and error.
For each JDBC datasource, we need a matching hibernate SessionFactory, like so:
<bean id="sessionFactoryOne" >
<property name="dataSource">
<ref bean="dataSourceOne" />
</property>
<property name="annotatedClasses">
<list>
...
</list>
</property>
<property name="hibernateProperties">
<props>
...
<!-- atomikos -->
<prop key="hibernate.current_session_context_class">jta</prop>
<prop key="hibernate.transaction.factory_class">
com.atomikos.icatch.jta.hibernate3.AtomikosJTATransactionFactory
</prop>
<prop key="hibernate.transaction.manager_lookup_class">
com.atomikos.icatch.jta.hibernate3.TransactionManagerLookup
</prop>
</props>
</property>
...
</bean>
<bean id="sessionFactoryTwo" >
<property name="dataSource">
<ref bean="dataSourceTwo" />
</property>
<property name="annotatedClasses">
<list>
...
</list>
</property>
<property name="hibernateProperties">
<props>
...
<!-- atomikos -->
<prop key="hibernate.current_session_context_class">jta</prop>
<prop key="hibernate.transaction.factory_class">
com.atomikos.icatch.jta.hibernate3.AtomikosJTATransactionFactory
</prop>
<prop key="hibernate.transaction.manager_lookup_class">
com.atomikos.icatch.jta.hibernate3.TransactionManagerLookup
</prop>
</props>
</property>
...
</bean>
Note the the only thing that changes across those configs is the bean id and the dataSource property, which should refer to the correct datasource bean.
Now, we create HibernateTemplates for each SessionFactory.
<bean id="hibernateTemplateOne" >
<property name="sessionFactory" ref="sessionFactoryOne"/>
</bean>
<bean id="hibernateTemplateTwo" >
<property name="sessionFactory" ref="sessionFactoryTwo"/>
</bean>
Again, the only thing that changes across those two is the bean id and the sessionFactory property.
And now for the Atomikos transaction management. This is pretty much straight from their documentation. Note that transactionTimeout is in seconds here, not milliseconds.
<bean id="AtomikosTransactionManager" init-method="init" destroy-method="close">
<property name="forceShutdown" value="true" />
<!-- this prop is in seconds -->
<property name="transactionTimeout" value="300"/>
</bean>
<bean id="AtomikosUserTransaction" >
<!-- this prop is in seconds -->
<property name="transactionTimeout" value="300" />
</bean>
<bean id="transactionManager" >
<property name="transactionManager" ref="AtomikosTransactionManager" />
<property name="userTransaction" ref="AtomikosUserTransaction" />
<property name="transactionSynchronizationName" value="SYNCHRONIZATION_ON_ACTUAL_TRANSACTION" />
</bean>
This seems to work with all the flavors of Spring's transaction management, but we use the handy annotations, so our context file also specifies:
<tx:annotation-driven />
Finally, by default (at least in our version), Atomikos expects a conf file for some other settings. It is called jta.properties, and it needs to be at the root of your classpath. Ours looks like this:
## Props for Atomikos JTA
com.atomikos.icatch.service = com.atomikos.icatch.standalone.UserTransactionServiceFactory
# max active transactions
com.atomikos.icatch.max_actives = 100
# tx logs
com.atomikos.icatch.log_base_dir = /var/log/tx/
# output logs
com.atomikos.icatch.output_dir = /var/log/
com.atomikos.icatch.console_log_level = WARN
com.atomikos.icatch.automatic_resource_registration=true
com.atomikos.icatch.serial_jta_transactions=false
These are described in the Atomikos docs, but max_actives is the max number of open transactions allowed, so you may want to tweak this value. Also, the log_base_dir is important as it is where Atomikos writes transactions in case of system failure. Their site mentions setting that to a reliable SAN or RAID storage, so you can read more about that if you'd like.
And that's about it. We have an AbstractDAO for each of our HibernateTemplates, and we just wire the correct one in. You can either Autowire by name, or by manually defining those beans in your Spring config and injecting the necessary bean.
I hope that helps anybody looking to do this, and I'll be sure to write another post if we need to tune/tweak/run away from using this once we pound on it a bit more.