
In the course of improving this website's search engine, I wrote a routine that would extract the text from an article given a URL, strip out the HTML, and then convert all of the white space and carriage returns into single spaces. This was done to compress the size of the text involved, which was then stored in the database and used for full-text searches. In order to eliminate all whitespace from a string, including newline characters, and replace it all with single spaces, I used regular expressions (with some help from
|