Scott and I are working on our second book (OCP 8). I’m excited that I get to write the part about regular expressions as that is one of my favorite programming topics. Why, you ask? Because it lets you write clear and efficient code.
The scenario
For the book, I wrote an example showing how validating a simplified phone number is so much easier with a regular expression. The rules for the example were:
- a phone number is exactly 10 digits
- a phone number may contain dashes to separate the first three digits and next three digits, but not anywhere else
- no other characters are allowed (no parens around the area code in this example)
For example, 123-456-7890, 123-4567890 and 123456-7890 are valid. In real life, the third one wouldn’t be; we allow this typo here to be nice. However dashes aren’t allowed in random positions. 12-45-67-890 is not a phone number.
Without regular expressions
This isn’t in the book, but i tired to write the code “the long way” to ensure it was annoying long. It was. I tried to write the code in a readable way and the best I could think of was:
private static boolean validateLong(String original) { String phone = original; // remove first dash (if present) if (phone.charAt(3) == '-') { phone = phone.substring(0, 3) + phone.substring(4); } // remove second dash (if present) if (phone.charAt(6) == '-') { phone = phone.substring(0, 6) + phone.substring(7); } // validate 10 characters left if (phone.length() != 10) { return false; } // validate only numbers left Set<Character> digits = new HashSet<>(Arrays.asList('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')); for (int i = 0; i < phone.length(); i++) { if (!digits.contains(phone.charAt(i))) { return false; } } return true; }
This is a lot of code. And to those who think regular expressions are unreadable, what do you think of the above? I don’t find it easy to see what is going on even though I wrote it. There’s just too much logic and too much detail to ensure is correct. (And no, it didn’t work on my first attempt.)
With regular expressions
Re-writing to use regular expression gives me this:
private static boolean validate(String phone) { String threeDigits = "\\d{3}"; String fourDigits = "\\d{4}"; String optionalDash = "-?"; String regEx = threeDigits + optionalDash + threeDigits + optionalDash + fourDigits; return phone.matches(regEx); }
Even if you don’t know the regular expression syntax, it should be obvious what is going on here. We look for three digits, an optional dash, three more digits, another optional dash and a final four digits.
It’s a tiny bit longer in the book version because {3} isn’t on the exam so that part is:
String threeDigits = "\\d\\d\\d"; String fourDigits = "\\d\\d\\d\\d";
Still. Way easier to read and faster to write than the original code without regular expressions. I consider regular expressions like a hammer. They aren’t the right tool for every job, but they are quite helpful when they are the right tool.
i like the idea of making REs readable by building up expressions from named pieces. Should help prevent some errors and be easier to modify too.
Me too! I’m amazed it isn’t more common.
This is a nice technique, which I first read at Martin Fowler’s site: http://martinfowler.com/bliki/ComposedRegex.html
It is a good practice, we should encourage it
Definitely like building regex in parts. I’ve found http://rick.measham.id.au/paste/explain.pl to be very helpful interpreting crazy regex.
A new and powerful weapon in my Java arsenal! Thanks!!
Do you know when your OCP 8 book will be ready Jeanne? I’m currently studying for OCA 8 with your book, which is great, would like to get the OCP 8 one too!
We are working on the OCP book. My guess is in the fall. Remember the OCP 8 objectives aren’t even out yet.
Regular expressions were removed from the exam in Java 8 so we never published what I wrote on this.