Text preprocessing: Wrap around a punctuation with spaces using Java

Input text = “Hello-my $world?”

Output text = “Hello – my $ world ? ”


public String wrapAroundPunctuation(String pText){

String replacement="";
//For example, '*' will be replaced by " * "

Pattern p = Pattern.compile("\\p{Punct}");
Matcher m = p.matcher(pText);
StringBuffer sb = new StringBuffer();

while (m.find()) {

//The $ causes Illegal Group Exception.

if(m.group().equals("$")){
replacement="\\$";
}
else{
replacement=m.group();
}

m.appendReplacement(sb, " " + replacement + " ");
}
m.appendTail(sb);

return sb.toString();

}

Advertisements
This entry was posted in Java, text mining. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s