Singapore Address Parser

With a little Regular Expression, we can parse almost anything follow a predefined pattern.

Singapore Address is a good example for parsing data with regular expression. The code in this article is produce by me today, feel free to use it for your purpose.

I use PHP for my parser, but it can be port to any other language which support Regular Expression.

The test suite

I have the following example, I just found it some where from Internet, but it's not important

Blk 900 SOUTH WOODLANDS DR #02-01 WOODLANDS CIVIC CENTRE, SINGAPORE 730900

 With some arrangement, I have some varies of the above example:

Blk 900 SOUTH WOODLANDS DR
900 SOUTH WOODLANDS DR, SINGAPORE 730900
Blk 900 SOUTH WOODLANDS DR #02-01 WOODLANDS CIVIC CENTRE
900 SOUTH WOODLANDS DR #02-01 WOODLANDS CIVIC CENTRE, SINGAPORE
Blk 900 SOUTH WOODLANDS DR #02-01 WOODLANDS CIVIC CENTRE, SINGAPORE 730900

If anyone have other varies, please comment.

Code Snippet

OK, here is the my little parser (in PHP)

/**
 * Singapore Address Parser
 * Author: Khoi Nguyen
 * http://kafeblog.com
 * 20 May 2008
 **/

function parse_sin_address($address){
  $blk      = "(Blk|Block )?";
  $blkno    = "([0-9]+)\s";      
  $street   = "([^#,]+)";          
  $room     = "(#[0-9\-]+)?\s?";
  $bld      = "([^,]+)?";
  $comma    = ",?\s?";
  $country  = "([A-Za-z\s]+)?\s?";
  $postal   = "([0-9]+)?";
  $re = "/(" . $blk. $blkno . $street . $room . $bld . $comma . $country . $postal . ")$/i";
  if (preg_match($re, $address, $matches)){
    // remove unnecessary captured matching
    array_shift($matches);
    array_shift($matches);
    while (trim($matches[0]) == "" ){
      array_shift($matches);
    }
    if (trim($matches[0]) == 'Blk' || trim($matches[0]) == 'Block') array_shift($matches);

    // Build result array
    return array(
      'number'    => $matches[0],
      'street'    => $matches[1],
      'roomno'    => isset($matches[2]) ? $matches[2] : "",
      'building'  => isset($matches[3]) ? $matches[3] : "",
      'country'   => isset($matches[4]) ? $matches[4] : "",
      'postal'    => isset($matches[5]) ? $matches[5] : ""    
    );
  }
  return false;
}

Usage

$address = "Blk 900 SOUTH WOODLANDS DR #02-01 WOODLANDS CIVIC CENTRE, SINGAPORE 730900";

$detail = parse_sin_address($address);
echo "\nNumber: ", $detail['number'];
echo "\nStreet: ", $detail['street'];
echo "\nRoom: ", $detail['roomno'];
echo "\nBuilding: ", $detail['building'];
echo "\nCountry: ", $detail['country'];
echo "\nPostal: ", $detail['postal'];

You will receives:

Number: 900
Street: SOUTH WOODLANDS DR
Room: #02-01
Building: WOODLANDS CIVIC CENTRE
Country: SINGAPORE
Postal: 730900