Java regular expression: Replace and negate some punctuations

The situation is that we want to replace some punctuations in a text string and we want to exploit POSIX code, \p{Punct}. However, we do not want to cover all punctuations in the \p{Punct} class. For example, we want to replace punctuations  in the \p{Punct} class except ‘.’, ‘/’, ‘<‘ and ‘>’ in an XML text string. We can construct a regular expression as follows:

String doc=”THE CONTENT OF YOUR XML DOCUMENT IS HERE.”;

<code>

String regex=”[\\p{Punct}&&[^<>./]]”;

doc=doc.replaceAll(regex, “”);

</code>

The idea is to exploit a boolean conjuction ‘&&’ and a negation ‘^’.

[Update: March 21, 2008] Well, how about negating string? It’s also not difficult. The key is using (?!) or (?=) in your regex.

Advertisements
This entry was posted in Java. Bookmark the permalink.

2 Responses to Java regular expression: Replace and negate some punctuations

  1. Steve Rose says:

    Just stumbled across this post whilst trying to figure out how to replace all punctuation in a string except for some allowed values. Exactly what I was looking for, you saved me a lot of work – I guess I owe you a beer.

    🙂

  2. Divya says:

    Thanks for the post, I was searching for the same thing!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s