The situation is that we want to replace some punctuations in a text string and we want to exploit POSIX code, \p{Punct}. However, we do not want to cover all punctuations in the \p{Punct} class. For example, we want to replace punctuations in the \p{Punct} class except ‘.’, ‘/’, ‘<’ and ‘>’ in an XML text string. We can construct a regular expression as follows:
String doc=”THE CONTENT OF YOUR XML DOCUMENT IS HERE.”;
<code>
String regex=”[\\p{Punct}&&[^<>./]]”;
doc=doc.replaceAll(regex, “”);
</code>
The idea is to exploit a boolean conjuction ‘&&’ and a negation ‘^’.
[Update: March 21, 2008] Well, how about negating string? It’s also not difficult. The key is using (?!) or (?=) in your regex.
Filed under: Java
Just stumbled across this post whilst trying to figure out how to replace all punctuation in a string except for some allowed values. Exactly what I was looking for, you saved me a lot of work – I guess I owe you a beer.
Thanks for the post, I was searching for the same thing!