production is slow – tuning jforum for javaranch – part 2

A moderator reported that moving topics took a long time in production and our sandbox.  I just tried it and got 70 seconds through the GUI!  It didn’t used to take this long.  I’m comparing our development sandbox a few months ago to our development sandbox today.  Clearly something has changed.  The question is what!

I heard a couple people liked reading about the thought process for troubleshooting slow queries. If you aren’t familiar with what a database explain is used for, see that entry before reading this one.

This time the path to success was a little more meandering.  I’ve left the dead ends in here so you get a feel for the actual thought process and not a sanitized version.  if you are a student, this is a decent feel for what problems look like in the “real” world.

  1. First step – find piece of code in JForum that handles moving threads.
  2. It’s GenericForumDAO.moveTopics.  I see this method was modified by us (JavaRanch) to support leaving a copy of the thread in the original forum or creating a new copy and locking the original (for use in moving to our private “Trash Can” forum.)  We made that change a long time ago though and it wasn’t slow then.
  3. Looking at the method, I see a number of database queries.  I’m adding logging (nothing fancy – just System.out’s) to see which query is using up so much time.
  4. The method I added logging to takes 5-8 milliseconds.  (across two trials).  Yet the browser plugs away for almost a full minute after that.
  5. Ah!  The web/action layer class ModerationHelper makes a bunch of calls to clear/reload the cache.  I bet that’s where all the time is going.  Adding logging on that level and redeploying.
  6. Now we are getting somewhere.  Everything goes very quickly until the part where the hotest topics get reloaded and then that takes almost a minute.
  7. Yuck.  Table scans left and right.  No wonder it takes so long.  Three tables scans, joining on the result and THEN discarding almost all the records.  (see “original query” and “original explain” in supplemental materials section below)
  8. If I was doing this by brute force, I would add an index on topic_views and refactor the query to search on the the topics table only joining with the others on result.  This is similar to the last post where I walked through an optimization.  However, I’m not sure that makes sense here.
  9. This definition of hottest posts uses the most page views forever.  This is ok for now because all the migrated posts started with zero views against them.  Making the hottest threads really “the hottest threads since our Saturday January 3rd migration.”  What happens down the road.  When a thread stops being viewed altogether, it’s not hot anymore.  I’m thinking it would be better to put a time based constraint on things.
  10. I’ll post in our Moderators Only forum to get some feedback on my proposed definition change and go have dinner.
  11. Unanimous agreement on my proposed definition from everyone who saw the question while I was eating.  Including the site owner.  Good news.
  12. Modify the query to add a where clause by thread last modified date
  13. This modified query takes 80 thousand units instead of 253 thousand units.  Better but still way too high.
  14. Next I’ll try adding index on topic_last_post_id, topic_views
  15. Running explain again gives the exact same output.  It isn’t using the index.
  16. Dropping the index since it didn’t work.  I may re-add it later but I like to test only one variable at a time.
  17. The database seems to be favoring the table scan of the forum table first.  Probably because this table is order of magnitudes smaller than the other two in the join.
  18. Now I’ll try removing the forum table from the query and run an explain to see what happens.  It’s still 78 thousand units, but at least it is using an index (post_time) as the first step.
  19. Time to try to add the index and run explain again.  Nothing.  Still 78 thousand units.
  20. All right.  This tweaking thing isn’t getting me anywhere.  New approach.  How would I build this query if I were writing it from scratch.
  21. The base query that I want to happen first is explain select p.post_id from jforum_posts p WHERE p.post_time >= ‘2008-01-01’.  Running an explain on just this says it takes 1 thousand cost units.  That sounds high – I’ll query some details.  Over 100 thousand posts in a few days?  Oh yeah!  It’s 2009 – a New Year.  I should know that – I celebrated recently.
  22. Re-running the original query with the post_time clause gives me under two thousand cost units.  Whew.  That’s much better.
  23. Now I’m uploading the changed query to our sandbox environment to see how the move performs.  Three to five seconds.  Good.
  24. Another moderator suggested using replies and not views.  This is a good idea, but has no effect on performance so I switched it. (see “Final query” in supplemental materials)
  25. And a good techie verifies everything so running explain one last time to confirm this last change didn’t affect performance.  It didn’t.  Performance is still ok. ( see “Final explain” in supplemental materials)

Summary

Two orders of magnitude performance improvement on the database query and a web action that isn’t painful to wait for.  I’m happy with that.  Not only is it faster, but we uncovered a better business definition for the query.

I also learned the last time we tested a move on our sandbox server was before we tested migrating large quantities of data.  Making it an invalid test.

Supplemental materials

Original query

SELECT t.*, p.user_id AS last_user_id, p.post_time, p.attach AS attach, f.forum_name \
FROM jforum_topics t, jforum_posts p, jforum_forums f \
WHERE p.post_id = t.topic_last_post_id \
AND p.need_moderate = 0 \
and p.forum_id = f.forum_id \
ORDER BY topic_views DESC \
LIMIT ?

Original explain

Row #         QUERY PLAN
1       Limit (cost=253457.29..253457.39 rows=40 width=132)
2       -> Sort (cost=253457.29..254467.05 rows=403905 width=132)
3       Sort Key: t.topic_views
4       -> Hash Join (cost=26807.80..136006.35 rows=403905 width=132)
5       Hash Cond: (“outer”.forum_id = “inner”.forum_id)
6       -> Hash Join (cost=26707.81..129847.79 rows=403905 width=113)
7       Hash Cond: (“outer”.post_id = “inner”.topic_last_post_id)
8       -> Seq Scan on jforum_posts p (cost=0.00..50770.74 rows=1775019 width=24)
9       Filter: (need_moderate = 0)
10       -> Hash (cost=19386.05..19386.05 rows=403905 width=93)
11       -> Seq Scan on jforum_topics t (cost=0.00..19386.05 rows=403905 width=93)
12       -> Hash (cost=99.79..99.79 rows=79 width=27)
13       -> Seq Scan on jforum_forums f (cost=0.00..99.79 rows=79 width=27)

Final query

SELECT t.*, p.user_id AS last_user_id, p.post_time, p.attach AS attach, f.forum_name \
FROM jforum_topics t, jforum_posts p, jforum_forums f \
WHERE p.post_time >= ? \
and p.post_id = t.topic_last_post_id \
AND p.need_moderate = 0 \
and p.forum_id = f.forum_id \
ORDER BY topic_replies DESC \
LIMIT ?

Final explain:

Row #         QUERY PLAN
1       Limit (cost=1774.18..1774.23 rows=20 width=132)
2       -> Sort (cost=1774.18..1774.28 rows=40 width=132)
3       Sort Key: t.topic_replies
4       -> Hash Join (cost=99.99..1773.12 rows=40 width=132)
5       Hash Cond: (“outer”.forum_id = “inner”.forum_id)
6       -> Nested Loop (cost=0.00..1672.53 rows=40 width=113)
7       -> Index Scan using idx_posts_time on jforum_posts p (cost=0.00..678.22 rows=177 width=24)
8       Index Cond: (post_time >= ‘2009-01-05 00:00:00’::timestamp without time zone)
9       Filter: (need_moderate = 0)
10       -> Index Scan using idx_topics_lp on jforum_topics t (cost=0.00..5.61 rows=1 width=93)
11       Index Cond: (“outer”.post_id = t.topic_last_post_id)
12       -> Hash (cost=99.79..99.79 rows=79 width=27)
13       -> Seq Scan on jforum_forums f (cost=0.00..99.79 rows=79 width=27)

javaranch/jforum – nailing down a production problem

JavaRanch successfully cutover to our new forum software yesterday.  Unsurprisingly, we had a variety of small problems that cropped up.  One of them was the fact that anonymous users were able to post.

Below follows the thought process and steps involved in fixing it.  And then in fixing the problems caused by the fix!

  • A moderator noticed anonymous users can post – that’s not good
  • I looked at “general” group and saw deny anonymous posts was selected for all forums.  This looks right.  Hmm.  What’s the fastest thing I can change to prevent anonymous users from posting?
  • I know!  I’ll create a new group, put anonymous in it and give that group read only access.
  • Seems to work.
  • I logged a bug that when not logged in we see “new topics” in one stray place.  (At the time this doesn’t seem related.)
  • Things seem stable.  Time to get off the computer and enjoy Saturday night.
  • Another moderator asked why that was a bug since he gets redirected to login at that point. (This moderator is in Europe and had a good chunk of Sunday before I woke up.  It’s nice how things get done around the world.)
  • Chat session with this moderator – we realized he saw one behavior on our sandbox and I saw the error in production.  We discussed the behavior on test is correct/desired.  I need to figure out why it doesn’t work that way in production.  End chat session.
  • I wonder if the database is out of synch due to our data migration yesterday.
  • I edit ed the new group to force a database update.  If the database wasn’t out of synch, this doesn’t hurt anything.
  • I opened a new browser and tried accessing when logged out.  I get the expected behavior now.  I see the button and am prompted to login when I click it.
  • Now to tie up loose ends – I made the same “change” (edit without changing anything) to the “general” group, moved anonymous back there and got rid of the group created yesterday.
  • Marking it off on the bug list and all done.

tuning jforum for javaranch

JavaRanch has been updating our forum software to an extension of JForum.  While testing, Ulf Dittmer noticed that the RSS feeds were incredibly slow – to the point of timing out.  He found that the database query itself was taking so long that the browser gave up.

JavaRanch Moose
Some of the queries built into JForum are written in a way that is fine for forums with a relatively small number of posts – even half a million posts total.  JavaRanch is much larger than that.  We imagine that we are the largest forum to use JForum.  (We have over 400 thousand threads/topics and over 1.7 million posts.)

I enjoy tuning queries, so I started looking at it.  When someone asked me about what I did, I thought it would be nice to blog about my thought process.  I’ve included the before/after queries and execution plans at the bottom so as not to break up the thought process.

1. Run original query at command line in our test environment (which is slightly smaller than the real data set, but close to realistic.)
2. Give up waiting after 5 minutes.
3. Run explain on the original query.
4. Cringe at the heap scan with a relative cost of 16628 – and worse within a nested loop.
5. Wonder why the query needs to do so much processing to get a limited number of rows.  The execution plan shows all that joining occurring on the entire table rather than just the 50 or so rows we care about (via the limit.)
6. Refactor the query to use a subquery to avoid this.  I had used this technique on another similar query the day before and gotten the query down to under 1000 cost units.
7. Run explain again.
8. See cost is 3807.  This is better, but there’s no logical reason it should be four times the cost of the query from yesterday.  Better keep tuning.
9. Hmm.  It’s using the index idx_topics_fp which is on topic_first_post_id.  We have almost 100 forums.  Wouldn’t it be more efficient to include the forum_id in the index.
10. Added index idx_topics_fp_forum on both topic_first_post_id and forum_id.
11. Run explain again.
12. See cost of 978.  Good.  This is about what I got on yesterday’s query.  I guess the built in indexes better matched what I needed on yesterday’s query.
13. Go back to GUI and try URL.  Takes a couple seconds.
14. Commit my changes.  I got a two order of magnitude increase on the query.  Very satisfying.


Original query:

PostModel.selectLatestByForumForRSS = SELECT p.topic_id, p.topic_id, p.post_id, p.forum_id, pt.post_subject AS subject, pt.post_text, p.post_time, p.user_id, u.username, u.user_first_name, u.user_last_name
FROM  jforum_topics t, jforum_posts p, jforum_posts_text pt, jforum_users u
WHERE p.post_id = t.topic_first_post_id
AND p.topic_id = t.topic_id
AND p.user_id = u.user_id
AND p.post_id = pt.post_id
AND p.need_moderate = 0
ORDER BY t.topic_first_post_id DESC

Modified query

PostModel.selectLatestByForumForRSS = SELECT p.topic_id, p.topic_id, p.post_id, p.forum_id, pt.post_subject AS subject, pt.post_text, p.post_time, p.user_id, u.username, u.user_first_name, u.user_last_name
FROM (select topic_first_post_id from jforum_topics where forum_id = ? order by topic_first_post_id desc limit ?) as nested,
jforum_topics t, jforum_posts p, jforum_posts_text pt, jforum_users u
WHERE p.post_id = nested.topic_first_post_id
AND p.topic_id = t.topic_id
AND p.user_id = u.user_id
AND p.post_id = pt.post_id
AND p.need_moderate = 0
ORDER BY t.topic_first_post_id DESC

Original explain

Row #         QUERY PLAN
1       Limit (cost=97566.69..97566.70 rows=1 width=504)
2       -> Sort (cost=97566.69..97566.70 rows=1 width=504)
3       Sort Key: t.topic_id
4       -> Nested Loop (cost=275.61..97566.68 rows=1 width=504)
5       -> Nested Loop (cost=275.61..97560.76 rows=1 width=473)
6       -> Nested Loop (cost=275.61..97556.05 rows=1 width=32)
7       Join Filter: (“inner”.topic_id = “outer”.topic_id)
8       -> Bitmap Heap Scan on jforum_topics t (cost=275.61..16628.72 rows=18174 width=8)
9       Recheck Cond: (forum_id = 1)
10       -> Bitmap Index Scan on idx_topics_forum (cost=0.00..275.61 rows=18174 width=0)
11       Index Cond: (forum_id = 1)
12       -> Index Scan using jforum_posts_pkey on jforum_posts p (cost=0.00..4.44 rows=1 width=24)
13       Index Cond: (p.post_id = “outer”.topic_first_post_id)
14       Filter: (need_moderate = 0)
15       -> Index Scan using jforum_posts_text_pkey on jforum_posts_text pt (cost=0.00..4.70 rows=1 width=449)
16       Index Cond: (“outer”.post_id = pt.post_id)
17       -> Index Scan using jforum_users_pkey on jforum_users u (cost=0.00..5.91 rows=1 width=35)
18       Index Cond: (“outer”.user_id = u.user_id)

Explain on modified query
Row #         QUERY PLAN
1       Sort (cost=3807.84..3807.94 rows=40 width=504)
2       Sort Key: t.topic_id
3       -> Nested Loop (cost=0.00..3806.78 rows=40 width=504)
4       -> Nested Loop (cost=0.00..3579.24 rows=40 width=500)
5       -> Nested Loop (cost=0.00..3342.41 rows=40 width=469)
6       -> Nested Loop (cost=0.00..3164.39 rows=40 width=453)
7       -> Limit (cost=0.00..2975.39 rows=40 width=4)
8       -> Index Scan Backward using idx_topics_fp on jforum_topics (cost=0.00..1351866.51 rows=18174 width=4)
9       Filter: (forum_id = 1)
10       -> Index Scan using jforum_posts_text_pkey on jforum_posts_text pt (cost=0.00..4.70 rows=1 width=449)
11       Index Cond: (pt.post_id = “outer”.topic_first_post_id)
12       -> Index Scan using jforum_posts_pkey on jforum_posts p (cost=0.00..4.44 rows=1 width=24)
13       Index Cond: (p.post_id = “outer”.topic_first_post_id)
14       Filter: (need_moderate = 0)
15       -> Index Scan using jforum_users_pkey on jforum_users u (cost=0.00..5.91 rows=1 width=35)
16       Index Cond: (“outer”.user_id = u.user_id)
17       -> Index Scan using jforum_topics_pkey on jforum_topics t (cost=0.00..5.68 rows=1 width=4)
18       Index Cond: (“outer”.topic_id = t.topic_id)

Explain after index
Row #         QUERY PLAN
1       Sort (cost=978.89..978.99 rows=40 width=504)
2       Sort Key: t.topic_id
3       -> Nested Loop (cost=0.00..977.83 rows=40 width=504)
4       -> Nested Loop (cost=0.00..741.01 rows=40 width=473)
5       -> Nested Loop (cost=0.00..513.47 rows=40 width=469)
6       -> Nested Loop (cost=0.00..335.45 rows=40 width=453)
7       -> Limit (cost=0.00..146.44 rows=40 width=4)
8       -> Index Scan Backward using idx_topics_fp_forum on jforum_topics (cost=0.00..66535.36 rows=18174 width=4)
9       Index Cond: (forum_id = 1)
10       -> Index Scan using jforum_posts_text_pkey on jforum_posts_text pt (cost=0.00..4.70 rows=1 width=449)
11       Index Cond: (pt.post_id = “outer”.topic_first_post_id)
12       -> Index Scan using jforum_posts_pkey on jforum_posts p (cost=0.00..4.44 rows=1 width=24)
13       Index Cond: (p.post_id = “outer”.topic_first_post_id)
14       Filter: (need_moderate = 0)
15       -> Index Scan using jforum_topics_pkey on jforum_topics t (cost=0.00..5.68 rows=1 width=4)
16       Index Cond: (“outer”.topic_id = t.topic_id)
17       -> Index Scan using jforum_users_pkey on jforum_users u (cost=0.00..5.91 rows=1 width=35)
18       Index Cond: (“outer”.user_id = u.user_id)