postgresql and jdbc

In postgresql – selecting maximum for each group, we saw the actual stored function.  That was the easy part.  Then there was dealing with Postgresql and JDBC.  I encountered a few surprises while doing this.

Calling the stored function

I expected to use JDBC’s CallableStatement since I was calling a stored procedure.  Nope.  The proper way to call it is:


PreparedStatement stmt = conn.prepareStatement("select * from highlighted_topic_per_forum(?) limit ?");

stmt.setString(1, formattedDate);
stmtp.setInt(2, limit);

Passing a timestamp

Originally, I was passing a timestamp into the stored function by calling stmt.setTimestamp() and a type of “timestamp without time zone”.  Try as I may, postgresql didn’t like this.  It complains $1 not found.  I eventually decided to pass the timestamp as a string.

Passing an array

I was originally planning to pass the category ids via JDBC.  Given what what I experienced with timestamps, I wasn’t so enthusiastic about trying this on an application I work on in my free time.  When I saw the advice online to extend the Array class, I decided to hard code the two category ids.

postgresql – selecting maximum for each group

I had a requirement to return the thread with the most replies in each forum at JavaRanch‘s Coderanch forums.  In Postgresql 8.4, this would be very easy – just use the window functions.  Unfortunately, we aren’t on Postgresql 8.4 yet.  The other common pattern is something like

select stuff
from mytable t1
where date = (select max(date)
from mytable t2
where t1.key= t2.key)

This doesn’t work well for me either because the value I am comparing (number of posts is dynamic.)  I decided to use a stored procedure to “simplify” things.  I’ve written some stored procedures in Postgresql to do updates before and stored procedures in other languages to do queries so this didn’t seem like a huge task.

Postgresql calls them stored functions, so let’s proceed.  First you need to create a type to represent a row that gets returned by the stored function.

CREATE TYPE highlighted_topic_per_forum_holder AS
(last_user_id integer,
post_time timestamp without time zone,
<other columns go here>);

Then you create the stored procedure.  The outer loop goes through each forum.  The inner loop is the SQL that finds the post with the most replies that was posted to in some time period.  It uses a nested query with a limit to return only 1 thread per forum.  The rest of the SQL adds the relevant data.  See postgresql and JDBC for why it takes varchar rather than timestamp.

CREATE or replace FUNCTION highlighted_topic_per_forum(character varying)
RETURNS SETOF highlighted_topic_per_forum_holder AS
$BODY$
DECLARE
forum RECORD;
r highlighted_topic_per_forum_holder%rowtype;
BEGIN
for forum in EXECUTE 'select forum_id from jforum_forums where categories_id in (1,7)' loop
for r in EXECUTE 'select p.user_id AS last_user_id, p.post_time, p.attach AS attach, t.* '
|| 'from jforum_topics t, jforum_posts p, '
|| '(select topic_id, count(*) from jforum_posts '
|| ' where post_time >= date '' ' || $1 || ' '' '
|| ' and forum_id = ' || forum.forum_id
|| ' AND need_moderate = 0 '
|| ' group by topic_id order by 2 desc limit 1) nested '
|| ' where p.topic_id = nested.topic_id '
|| ' and p.post_id = t.topic_last_post_id '
|| ' order by post_time desc' loop
return next r;
end loop;
end loop;
return;
END;
$BODY$
LANGUAGE 'plpgsql';

normalization done poorly at the bank

I went to a bank yesterday to inquire why I was receiving mail at my old address despite having moved a number of months ago.  The first thing the representative did was pull up my record to show me that my address was correct in their system.  While that’s very nice, the mail has their name on it.  What’s more, I have received mail from this bank at my correct address already.

Then the representative tells me he’ll try to change it again in case the other person didn’t do it right.  While doing this he finds the problem and shows it to me.  This bank, in all their wisdom, has the customer’s mailing address on two screens.  A representative must make the change in two places.

Lessons from this:

  1. Scott blogged about normalization last year.  Having information in two places invites one to be wrong.  And we aren’t talking about the name of the state here.  Customer address is a piece of information you would expect to change over time.
  2. If you do have the same data in two places, you should write code that updates both rather than require people to do it.