Java program to remove all non-ASCII characters from a string

Java program to remove all non-ASCII characters from a string:

You might need to remove all non-ASCII characters from a string, either it is in a file or you want to remove all non-ASCII characters from a string before you save it in a database. Java doesn’t provide any method to do that and we can easily achieve that by using regular expression or regex.

I will show you different ways to remove all non-ascii characters from a string in Java.

Method 1: ASCII values regex pattern:

Let’s write a regex pattern that will match all characters which are not in a valid ASCII value, i.e. not in between 0 to 127. The hexadecimal value of 0 is x00 and the hexadecimal value of 127 is x7F. So, we can replace all characters whose ASCII values are not in this range with a blank string.

class Main {
    public static void main(String[] args) {
        String givenString = "©Hello←→⇒ ÃWorld ®";

        String finalString = givenString.replaceAll("[^\\x00-\\x7F]", "");

        System.out.println("Final string: "+finalString);
    }
}

replaceAll method takes one regular expression pattern as the first parameter and the replacement string as the second parameter. It replaces all characters that are matched by the regex with the replacement string. It returns the new string. The last line is printing that string.

The regex in the above example is matching all characters which are not in 0 to 127 ASCII value.

If you run the above code, it will print the below string:

Final string: Hello World 

As you can see here, all non-ASCII characters are removed or replaced with blanks.

Method 2: Another way:

We can also use P{ASCII}. It will remove all non-ascii characters.

class Main {
    public static void main(String[] args) {
        String givenString = "©Hello←→⇒ ÃWorld ®";

        String finalString = givenString.replaceAll("\\P{ASCII}", "");

        System.out.println("Final string: "+finalString);
    }
}

If the text is in Unicode, \P{M} should be used.

Method 3: Remove non-printable characters:

You can also remove all non-printable characters. It will also remove \t, \n and \r as well.

class Main {
    public static void main(String[] args) {
        String givenString = "©Hello←→⇒ ÃWorld ®";

        String finalString = givenString.replaceAll("\\P{Print}", "");

        System.out.println("Final string: "+finalString);
    }
}

You might also like: