a response to – “What would you say is the average percentage of development time devoted to creating the unit test scripts?”

How long does it take to compile? We don’t ask that. It would be absurd. The fact that people ask how long unit testing takes mean they see it as an optional cost to be incurred. What I want to know is why they don’t ask for a similar accounting of the cost of NOT writing unit tests!

I think the “cost” of unit testing falls into three categories:
1) The early days/training/learning
2) The ongoing cost
3) The cost of not testing/cost of errors

The early days/training/learning
The first time you do something, it takes longer.  This is why we pay experienced people more. It is expected there is a training or learning cost. If an activity is worthwhile, one recoups this cost quickly.  It is called an investment.

The problem I see is that some teams don’t get past this point. They see unit testing taking longer the first time and imagine it will take that long forever. I wonder how these people learned Java or regular expressions or anything else. Except that they wanted to learn the hard tech. If one believes the current buggy state of affairs works or can’t imagine a better way, staying motivated to get over the learning hump is difficult.  This is why having a mentor or coach promoting unit testing is helpful.

The ongoing cost
When writing unit tests as part of the task, it is difficult to measure the
amount of time it takes.  I really can’t tell you what percentage of development time I spend “writing tests” because it occurs at the same time I do the other parts of the task.  I also couldn’t account for the percentages of time I spend typing, thinking, compiling, etc.  These things are occurring simultaneously.

Many of the people who complain the ongoing cost of testing is to high are treating it is a separate activity. They write the code, manually test it, debug, repeat and then write a unit test. If done that way, the entire cost of writing the test is extra. And often resented because they really are “done” before they start writing the tests.  In this case, the unit tests only have value as regression but not as part of the development process.  While I suppose it is better than nothing, I find this a way to make writing tests more costly than it should be.

I’m not saying one has to use TDD (test driven development) to see a benefit. Using the unit tests as a replacement for that manual testing lets it become a cost you need to incur anyway. Granted, it takes a little longer to write a test than test by hand. But this is minimal. It is also offset by way less regression issues and the ability to test error conditions easily. And if you write tests in close proximity to the code, it is even faster.

There’s another measurement issue going on here.  Yes, it takes longer to write tests than test manually ONCE.  If you are writing a script that will only be run once, it isn’t worth it.  How many times have you had an application that was written once and then never touched again?

The cost of not testing/cost of errors

This is the part that really bothers me.  There is a cost to not doing an activity.  It tends to be swept under the rug and treated as a cost of doing business.  It is hard to see opportunity cost.  But it still exists.  The two biggest costs I see are finding errors late and regression errors in future release.

  1. We’ve all see the curve that shows how fixing errors late is much more expensive than finding them in development.  Unfortunately, developers get to claim they are “done” when they’ve really just moved the errors to later.  They are more expensive, but the developer gets to claim they finished the task in X days.  Managers need to stop allowing this.
  2. The future release problem is even harder.  I think more of the value of having unit tests comes from the future.  Some developers claim that they don’t need unit testing because they produce high quality code without it.  Sometimes this is even true.  However, what happens when that developer looks at the code in a year or leaves the team.  The unit tests are code and live on.  My development velocity on maintenance/enhancement tasks is much higher with unit tests because I don’t have to posit what I was thinking a year ago when I wrote the code.

What if my management still needs a cost

This blog post is inspired by someone asking me this question.  “What would you say is the average percentage of development time devoted to creating the unit test scripts?”  While I take issue with the question, I think he is still going to have it.  As a result, here are links to three articles/webpages that use numbers.

  1. Misko Hevery comes up with a figure of 10% cost.  He calls it a 10% tax and points out the benefits that come a tax.  Note that he is writing tests as an integral part of his process and is fluent in doing so.  He also actively dispels the myth that testing takes twice as long.
  2. Brian Johnston discusses costs. I like that he covers the cost of not testing too.   While he picks extreme figures (based on the worst case myth), it does cover risks and things to take into account for your own shops.  Also, he discusses the hardest 15% of tests.  The earlier tests written are the easier ones and cost less.
  3. A variety of opinions wiki’d. While numbers are mentioned, the real value of the page is the caveats for dealing with such numbers.

Conclusion

When talking to management about a cost, make sure they know about the associated benefits.  And the cost of other options.  Doing nothing is an option and has a definite cost!

see part 2 of this blog post

The “Reinventing the Wheel” Anti-Pattern

Reinventing the Wheel As a moderator on the JavaRanch, I often come across posts asking how to reinvent features that are available in most application servers. In JDBC for example, I’ve had people ask how to implement their own database connection pooling and how to create their own JDBC driver. Often times the developer is trying to create something that already exists, but they are unsure how to use it. The general rule of thumb I tell programmers is, “If you feel like you are reinventing the wheel, you probably are”.

Review the API
If you ever get the feeling you are inventing the wheel, it’s a pretty good indication you are. In such situations, ask yourself “Is it likely a developer using the same component would need functionality X?”. If the answer is “yes”, then there’s a good chance there’s already such a feature in the API. Most often, reinventing the wheel comes from developers who are too lazy to review the API but not lazy enough not to rewrite the feature they think they are missing. Also, search the web. In some cases, you may need to download a new or updated library to get the feature you want, but this may give you access to even more features.

Your wheel isn’t better
Often called “Reinventing the square wheel”, it is likely the code you are recreating is worse, more buggy, and far less stable than the code you should be using. If you consider it for a moment, it makes sense a method built within the API should be better than a method built on top of the API. The API developer has access to private methods and objects that you do not have access to, and therefore your solution is limited by the public/external methods of the API. Furthermore, the fact the code is part of the API means scores of developers have hopefully reviewed the code for errors and performance enhancements.

That doesn’t necessarily mean the API implementation is better, but whenever I hear a developer say “I have a faster and cooler way of doing this than the way they do it in library” I cringe at the thought of what they may have written. If you do happen to create a better wheel, join the open source project and publish it for others to judge.

There’s always public humiliation
Many of the worst best articles that appear on the The Daily WTF come from programmers reimplementing the most basic functions of a language. One of the things that separates an experienced programmer from a beginner is the ability to recognize what tools in the API are needed for a task and how quickly they can be put together. And with that, I present a list of examples written by real developers and posted online for the world to see:

Any my personal favorite: I’ve heard from some good sources that the next version of SQL will use the word “GIMMIE” instead of “SELECT”

Java + Cron Job = Quartz

Quartz Enterprise Job SchedulerOne of my favorite, often least used, open source tools for Java/J2EE applications is the Cron Job scheduling tool Quartz. Anyone who’s ever administered Linux or a web server is probably familiar with creating and modifying cron jobs to run a process at a specific time of the day/week/month. For example, you may need a nightly clean job for a data directory, or you may need to generate reports automatically at the end of the week. What I like about Quartz is that it’s simple to use, works in both Java and J2EE server-based applications, and is easy to install.

Java with operating-system cron jobs
Despite the availability of Quartz and similar Java-based tools, some developers still choose to use the operating system crontab and set it up to call Java methods directly. Although this can work well in practice, it’s not a very stable solution. For example, if the Java home variable changes, the cron job could break. Also, it’s not portable, since each operating system has a slightly different scheduling tool. Most importantly, though, the application is more vulnerable to attack since it requires input from a process outside the JVM.

What is Quartz?
Quartz is an open source scheduling module written entirely in Java, which lives inside the JVM. It has complete support for creating jobs based on crontab-like syntax, such as using the string “0 4 * * * ?” to run a job every day at 4am. It also supports a more rigorous non-crontab syntax for schedules that can’t be specified in a single string. Anytime a developer needs to write a process that runs in the background, whether its run once a day or every 5 minutes, they should consider Quartz for their scheduling needs.

Creating a Job
Even though you probably only have one job you want to schedule, all Quartz applications start by creating a scheduler that can support any number of jobs using the following code:

SchedulerFactory schedulerFactory = new org.quartz.impl.StdSchedulerFactory();
Scheduler scheduler = schedulerFactory.getScheduler();
scheduler.start();

From there, we can create our 4am job schedule by defining the job, defining the schedule, and then tying the two together by adding them to our scheduler instance, as below:

JobDetail job = new JobDetail("myJob",MyClass.class);
CronTrigger schedule = new CronTrigger("mySchedule",Scheduler.DEFAULT_GROUP,"0 4 * * * ?");
scheduler.scheduleJob(job,schedule);

Finally, you create a job class, in this case MyClass, that implements the Job interface and has a method quite similar to a main method:

public class MyClass implements Job {
   public void execute(JobExecutionContext context) {
    ... // Perform job
   }
}

Keep in mind that this code to create the scheduler and job can be in any class. The only class-level restriction is that the job itself has to implement the Job interface. How and where the job is created is up to you.

J2EE: How to apply?
J2EE servers often run for long periods of time, therefore they are a natural fit for Quartz scheduling. For example, you can use Quartz to create reports out of large sets of data in the middle of the night when usage is low. There are literally dozens of ways to integrate Quartz with J2EE, but the two main ways I prefer to use are:

  • Job calls a session bean method
  • Job creates a message and sends it to a JMS queue

In both cases, the job itself is *never* more than a page of code. It just picks up what it was called for and executes a J2EE call. In this manner, you might have a bean called ReportBean with a method on the bean call generateNightlyReport(). The Quartz job would be a short segment of code that connects to the bean and executes the session bean command.

My favorite method, though, is to have a job create a message and send it to a JMS queue, since the Quartz process can return without waiting for the actual job to finish. Also, the job does not require a transaction or context since it’s going to a queue instead of executing a bean directly. As long as you have a messaging bean watching the queue, the job will get executed soon after the Quartz scheduler has finished processing the request.

Some tips
Hopefully this article has given you a taste for Quartz as a scheduling tool. While I am aware there are other scheduling tools in Java, Quartz has always worked right out of the box for me with very little effort, so to be honest I’ve never had a reason to try another. Here are some tips I recommend to write good Quartz applications:

  • Keep your job class under a page. If you find yourself writing a very large job class, extract the useful code into a separate class and have the job code call that class. In this manner, there’s very little code actually tied to your scheduler and you can reuse the class outside the context of scheduler.
  • If your schedule executes often or your jobs are quite long, there’s the distinct possibility a job could be started while the last job is running. For example, if a job runs every 2 minutes and the first job is taking 3 minutes, Quartz won’t block the second job from starting so you will have multiple instances of the same job running at once. While there are probably ways to prevent this within Quartz, one sanity check I like to enforce is a semaphore lock that prevents two threads from executing the same code at the same time. In the case a second job is started while the first is running, the second should just exit instead of waiting for the first to finish. In Java, you can do this atomically by setting an int to 0 or 1.
  • Quartz is often included in a number of J2EE server packages, so you may have it without the need to import the libraries. Keep in mind, though, that the existing version installed with the J2EE server may be older than the one you want to use. In that case, you may want to import your own Quartz jar into the application.