Just wanted to share something we found out while researching on an issue. We wanted facility to retrieve all rows from a very large table (approximately 2 million rows) in a java program using JDBC.
When we tried to retrieve using standard JDBC the program kept going out of memory. We could have chosen to increase the heap size, but that would have been non deterministic, for example if the number of rows would have increased to 5 million at a later point we would have to increase heap size again.
We decided to read the rows in batches and process and retrieve next batch. We did a bit of research on the internet, but could not get something very obvious which would help us in this. We did more research on JDBC API and came up with the following solution, we would appreciate any comments on the same
We created a JDBC stmt object with ResultSet Type TYPE_SCROLL_INSENSITIVE, this ensured that the entire resultset of select * query does not get loaded in memory on executeQuery.
stmt = dbConnection.createStatement( ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_READ_ONLY);
After that we set FetchSize to 1000, since by default its 10 queries and that becomes quite slow while batch processing. stmt.setFetchSize(1000); rs = stmt.executeQuery(sql);
To navigate throught the resultSet we used absolute () API to position the cursor rs.absolute(startRow) and then navigated the record set using standard API to read entire batch. Lo behold we were able to process millions of row without an issue.
When we tried to retrieve using standard JDBC the program kept going out of memory. We could have chosen to increase the heap size, but that would have been non deterministic, for example if the number of rows would have increased to 5 million at a later point we would have to increase heap size again.
We decided to read the rows in batches and process and retrieve next batch. We did a bit of research on the internet, but could not get something very obvious which would help us in this. We did more research on JDBC API and came up with the following solution, we would appreciate any comments on the same
We created a JDBC stmt object with ResultSet Type TYPE_SCROLL_INSENSITIVE, this ensured that the entire resultset of select * query does not get loaded in memory on executeQuery.
stmt = dbConnection.createStatement( ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_READ_ONLY);
After that we set FetchSize to 1000, since by default its 10 queries and that becomes quite slow while batch processing. stmt.setFetchSize(1000); rs = stmt.executeQuery(sql);
To navigate throught the resultSet we used absolute () API to position the cursor rs.absolute(startRow) and then navigated the record set using standard API to read entire batch. Lo behold we were able to process millions of row without an issue.
No comments:
Post a Comment