Thus far, we have concentrated on simple, single-column indexes and simple queries that can be tuned in this way. Single-column indexes created on
time_create columns will help us with the queries we'll use to filter or sort their values. Even the following query:
SELECT * FROM tbl WHERE owner = 'dejan' AND time_create > '2006-01-01 00:00:00' ORDER BY time_create
will have acceptable performances. But if you try to execute this query:
SELECT * FROM tbl WHERE owner = 'dejan' ORDER BY time_create
you will get a very long execution time. This is because of the extra sorting step that the database needs to perform in order to sort data.
The solution for these types of queries is to create an index that will cover both the
time_create columns. We can achieve this by executing the following query:
CREATE INDEX tbl_owner_time_create ON tbl(owner, time_create)
With this index in use, the query performance will dramatically improve. Now, notice the following lines in the analyzer log:
Index Scan ResultSet for TBL using index TBL_OWNER_TIME_CREATE at read committed isolation level using share row locking chosen by the optimizer
We have helped the database by letting it use a handy index to quickly find already sorted data.
The important thing to notice in this example is that column order in the
CREATE INDEX statement is very important. Multiple-column indexes are optimizable by the first column defined during index creation. So, if we had created the following index:
CREATE INDEX tbl_time_create_owner ON tbl(time_create, owner)
instead of one we used previously, we wouldn't see any performance benefits. That is because the Derby optimizer could not consider this index as the best execution path and it would simply be ignored.
Indexes can help us improve performance when data selection is in question. But they slow down database insert, and delete and possibly update operations. Since we not only have table structure, but various index structures, it takes longer for the database to maintain all these structures when data changes.
For example, when we are inserting a row in a table, the database must update all indexes related to columns of that table. That means that it has to insert an indexed column value in the right place in the appropriate index, and that takes time. The same thing happens when you delete a certain row, because the index must be kept ordered. Update actions affect indexes only when you update indexed columns, since the database must relocate those entries in order to keep indexes sorted.
So, the point is to optimize database and application design according to your needs. Don't index every column; you might not use those indexes, and you might need to optimize your database for fast inserting of data. Measure your performance early and identify bottlenecks; only then should you try to implement some of the techniques provided in this article.
In this article we have focused on just a small subset of performance-related issues you can find in everyday development tasks. Most of the principles shown here could be used (with some modifications) to any relational database system available. There are many other techniques that can help you improve the performance of your application. Caching is certainly one of the most effective and widely used approaches. There are many caching solutions for Java developers (some of them, such as OSCache or EHCache, have open source licenses) that could serve as a buffer between the application and database and thus improve overall application performance. Also, many object-relation frameworks used in Java projects (such as Hibernate) have built-in caching capabilities, so you should consider those solutions as well, but that's the material for another discussion.
- Derby home page
Dejan Bosanac is a software developer, technology consultant and author. He is focused on the integration and interoperability of different technologies, especially the ones related to Java and the Web.
Return to ONJava.com.