Wednesday, June 16, 2010

Group pattern matching with regular expressions in Java and Scala

Even though this has been around for some time I have only recently used it and think it is quite nice and worth blogging about.

The use case is pretty straight forward, you have a string of data and you want to extract values out of the string based on a pattern. An example would be a date “16-Jun-2010” and you want to extract the day, month and year. Another example could be an email address where you want to extract the username and domain. I will show you an example of extracting the day, month and year values from a string using regular expressions. Regular expressions allows us to match the format of the string as well as to group matches within the string so that we can get our day, month and year values.

Java
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GroupCaptureEx {
  public static void main(String[] args) {
   String input = "16-Jun-2010";
   String patternStr = "(\\d{2})-([a-zA-Z]{3})-(\\d{4})";

   Pattern pattern = Pattern.compile(patternStr);
   Matcher matcher = pattern.matcher(input);
   if (matcher.find() && matcher.groupCount() == 3) {
      System.out.println("Day is: " + matcher.group(1));
      System.out.println("Month is: " + matcher.group(2));
      System.out.println("Year is: " + matcher.group(3));
    } else {
      System.out.println("No match found or unexpected match found");
    }
  }
}


The Scala version uses a Scala Regex class to simplify matters a little. It compiles the pattern by default so you don’t have to explicitly do that. It is also an Extractor which is used to extract the data you are looking for from the string based on the group matching and then to bind those values to the returned elements. The only thing we need to concern ourselves with is Scala’s pattern matching ability. The pattern we are interested in matching on would look like:

DateRegex(day, month, year)

We then use Scala’s match expression (similar to switch in Java) on the input string. If the pattern DateRegex(day, month, year) matches the string than we have a match

Scala
object RegExGroupCapture {
  def main(args : Array[String]) : Unit = {
    val Input = "16-Jun-2010"
    val DateRegex = """(\d{2})-([a-zA-Z]{3})-(\d{4})""".r
  
    Input match {
      case DateRegex(day, month, year) => {
        println("match found")
        println("Day: " + day)
        println("Month: " + month)
        println("Year: " + year)
      } case _ => println("No match found")
    }
  }
}

Thursday, June 3, 2010

Programming in Scala is fun

In learning more about what Scala can do I decided to work on the problems listed in the Project Euler website.

"Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems."

I have only completed the first ten and have thoroughly enjoyed it. Not only have a learnt some neat tricks one can do with Scala I have also learnt about a number of different mathematical algorithms. At some point I would like to go back over some of the problems and see how one would do them in Java just to compare the difference between Java and Scala. I can only imagine it would be a lot less enjoyable and perhaps even a little painful.