Remove Color from Messages





Remove Color from Messages

figs/moderate.gif figs/hack82.gif

Parsing messages is difficult when they contain color characters. Make messages easier to store and parse by removing these characters.

Whether you are trying to parse messages on the fly or store them in a different format, you will notice that people who use colored messages throw a monkey wrench in the works. Adding color to messages means adding lots of spurious control characters. These have to be removed for the message to make sense to anything that isn't an IRC client.

If you take a raw message from an IRC channel and paste it directly onto a web page, it will appear quite different from the colored version in your IRC client. You will see no color at all. Instead, you will see the message with some extra characters sprinkled along it.

One particular situation in which it is useful to remove colors is when you are running an artificial intelligence bot, which learns by reading what other users send to the channel. Removing the special color characters is essential here; otherwise, the bot will get confused and end up speaking multicolored gibberish.

1 Simple Color Removal

Let's create some code to remove simple colors. Simple colors are marked by the control character 0x03 and are followed by one or two digits. The number after the control character should be between 0 and 15 inclusive, but may contain an optional leading zero to bulk it up to two digits. Most IRC clients treat any value (00-99) as a valid color, although only 0-15 are clearly defined.

An optional background color may be specified by appending a comma to the foreground color code. This is followed by another one- or two-digit code to specify the background color. You must also take this into account when you remove color codes from a message.

1.1 Perl solution

Using regular expressions, this is a trivial one-liner. The following line removes simple coloring from the input:

$input =~ s/ \x03[0-9]{1,2}(,[0-9]{1,2})?//g;

1.2 Python solution

The Python regular expression module lets you apply the same replacement to a Python variable:

import re

re.compile(" \x03[0-9]{1,2}(,[0-9]{1,2})?").sub("", input)

1.3 Java solution

Again, with regular expressions available in Java 1.4 and beyond, this is easy. To remove simple coloring from the input, just do this:

input = input.replaceAll(" \u0003[0-9]{1,2}(,[0-9]{1,2})?", "");

1.4 Java Applet solution

All good Applets should run in Java 1.1, as there is rarely any guarantee that an end user will have anything more recent. Most browsers are supplied with a 1.1-compatible Virtual Machine without the user having to apply any updates.

Being restricted to Java 1.1 makes the process of color removal much more verbose. Although there are more lines of code, it is no less efficient than using regular expressions—if they were available!

This method can be used to remove simple coloring from within a Java Applet:

    // A rather long but efficient way of removing colors in Java 1.1.

    public static String removeColors(String message) {

        int length = message.length( );

        StringBuffer buffer = new StringBuffer( );

        int i = 0;

        while (i < length) {

            char ch = message.charAt(i);

            if (ch == '\u0003') {

                i++;

                // Skip "x" or "xy" (foreground color).

                if (i < length) {

                    ch = message.charAt(i);

                    if (Character.isDigit(ch)) {

                        i++;

                        if (i < length) {

                            ch = message.charAt(i);

                            if (Character.isDigit(ch)) {

                                i++;

                            }

                        }

                        // Now skip ",x" or ",xy" (background color).

                        if (i < length) {

                            ch = message.charAt(i);

                            if (ch == ',') {

                                i++;

                                if (i < length) {

                                    ch = message.charAt(i);

                                    if (Character.isDigit(ch)) {

                                        i++;

                                        if (i < length) {

                                            ch = message.charAt(i);

                                            if (Character.isDigit(ch)) {

                                                i++;

                                            }

                                        }

                                    }

                                    else {

                                        // Keep the comma.

                                        i--;

                                    }

                                }

                                else {

                                    // Keep the comma.

                                    i--;

                                }

                            }

                        }

                    }

                }

            }

            else if (ch == '\u000f') {

                i++;

            }

            else {

                buffer.append(ch);

                i++;

            }

        }

        return buffer.toString( );

    }

The PircBot API contains a removeColors method in the Colors class.


2 Hacking the Hack

If you have created an IRC bot that writes channel logs to a web page, why not try to retain the information contained in the coloring? One adventurous task would be to modify the methods here to create colored HTML from a message instead of simply removing all color. This is a much harder task than it first seems, so make sure you think about it before you start implementing anything.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows