Live from JaveOne: JDBC Spec Enhancements

Live from the JavaOne conference again, this time attending a session about enhancements to the JDBC specification 4.1. Four members of the JDBC specification team have presented a number of useful enhancements due in the upcoming version of JDBC. Some of these features may not make it into 4.1, but will be in future versions of JDBC down the road.

1. Named Prepared Statements
As many of you may already know, Prepared Statements support parameters referenced by numeric index such as ps.setString(1,”val1″) where “1” corresponds to the 1st index of “?” in the SQL string. Perhaps rightfully so, a lot of new developers to JDBC often complain that JDBC should support named parameters as well as numeric indexes. Well those developers are in luck, future versions of JDBC may support syntax such as the following:

NamedPStatement nsmt = con.createNPStatement("SELECT f1, f2, f3 FROM foo 
     WHERE f1 = :f1 AND f2 < :f2");
nstmt.setParameter("f1","val1");
nstmt.setParameter("f2","val2");
ResultSet rs = nstmt.executeQuery();

A quick screenshot of the slide covering this information:

2. Chaining Commands
Using the previous example, the team intends to support chaining commands such that setParameter() returns a reference to the object allowing single-lined commands such as the following.

ResultSet rs = nstmt.setParameter("f1","val1").setParameter("f2","val2").executeQuery()

3. Changing schemas on the fly
For databases that support the notion of database schemas, you will be able to change your current schema dynamically after the connection is established, using the following command:

Connection connection = ...
connection.setSchema("mySchema")

For those databases that do not support such commands, this would likely be a no-op.

4. Timezone support
Better support for time zones are due in distant future versions after Java 1.7 adds a lot of new features related to them.

More information from their slide:

See you in San Francisco!

I’ll be away this week from June 1-5 attending the Sun JavaOne Conference in beautiful San Francisco. I plan to expand my knowledge of advanced J2EE/SOA topics as well as pick up some introductory courses on new technologies. If you want to meet up to discuss any of my recent articles, the conference, or how cold San Francisco seems to be this time of year, drop me an e-mail!

San Francisco

The Joy of Null: Continued

In Part 1 of The Joy of Null I discussed a variety of ways null-equivalent values make it into the software design. Often times, developer laziness or immutability of the database tier drives many developers to insert values that simulate null values, rather than using a database null itself. In this second half, I’ll talk about other ways null-equivalent values arise as well as problems associated with using them.

Null

Part 1: Revisited
The most consistent comment I saw for why some developers choose to use empty strings instead of null values was for performance reasons. As I’ve mentioned on this blog before, database query optimizers are often a loose association of greedy algorithms and indexes, and its certainly possible a DBMS may perform better when a field is an empty string rather than a null, but I doubt its a general rule. In fact, from a theoretical standpoint, testing whether a field is null should be faster than comparing two strings. Consider that testing whether a field is null is a binary test, it either is or it is not null. Alternatively, testing whether a string is empty, or in the more difficult case is a set of blank spaces equivalent to an empty string, requires a string comparison which, depending on how it’s implemented, could be slow.

I think when people consider performance, they might be referring to the fact that no string = null in a database. In order to query on null, you must use “IS NULL” syntax since (x = null) returns false for all values of x, null or otherwise. Yes, having nulls and empty strings in a query could complicate the logic, but your goal should be to remove the empty strings, not the null values.

Part 2: Data Ambiguity
The first problem with null-equivalent values is you may unintentionally have more than one of them. For example, if you consider an empty string to be a null-equivalent value, do you also consider a 1 or more blank spaces null-equivalent? If so, your software will have to formulate SQL queries with syntax such as “trim(widget) LIKE ””, or alternatively “length(trim(widget)) = 0”. Either way, you now have to perform a database function on every string in a table in order to determine if a value is present, whereas “IS NULL” should be a lot faster.

The second problem with null-equivalent values is you may intentionally have more than one of them. I’m going to refer to enumerated null values as a set of null values that all represent null, but might mean slightly different things. As one reader pointed out, you may have a DBMS that supports enumerated null values such as NULL_MISSING, NULL_REMOVED, NULL_UNKNOWN, etc. I have no objection to using enumerated nulls except that very few, if any, major DBMS systems support it due to the fact it would be difficult to get a group of developers and DBA’s to agree on an enumerated set of nulls that would work across all domains. With that in mind, the vast majority of times enumerated null values are used, they are set as strings in the database. Such as Name = ‘NULL_MISSING’. This has all the performance and string comparison problems of my previous argument, but one even worse – someone’s name/data might actually match a null equivalent value. Your system would be a lot more prone to SQL injection if such a thing were allowed and would require constant conversion between the enumerated value and a useful value in the UI, since you don’t want to expose NULL_MISSING to the user. Keep in mind, this includes alternative null-equivalent values such as using -1 or 0 for a positive numeric field. At some point -1 or 0 may be allowed, or accidentally displayed to the user as a real value. No matter how you set it up, enumerated nulls can often lead to bad data such as the issue with little bobby tables

Finally, there are many times null-equivalent values are used side by side with null values leading to unruly queries such as “WHERE widget IS NULL OR length(trim(widget)) = 0″. As most good DBA’s already know, disjunctive searching (using OR) in SQL queries can significantly hurt a query optimizer. Disjunctions are among the most common paths query optimizers will ignore because searching them is not possible in real-time. In short, if you are using an empty string as a null-equivalent value then you should use it everywhere and make the column not nullable. This will at the least simplify your queries and remove the disjunction. There are other ways to formulate the query above without disjunctions, such as using the negation of length > 0, but it still leads to complicated queries.

Part 3: Choose one
I know there are some developers uncomfortable with using database null or “IS NULL” in their queries, so to them I say, at least be consistent. Either use a null-equivalent value, such as an empty string, everywhere or use null everywhere. The ambiguity is when you allow both, which in some data models may mean different things. Overall, allowing both will likely cause a performance hit. As for which you should choose, null or null-equivalent, if you have strong reason to believe your DBMS will handle empty strings better than null values, then go with empty strings. I, on the other hand, will stick with database null values since they should be faster for the vast majority of queries.