Straighten Smart Quotes





Straighten Smart Quotes

Convert curly quotes, apostrophes, and other fancy typographical symbols back to their ASCII equivalents.

Have you ever gone to copy a block of text from a web site and paste it into a text editor (or try to paste it into a weblog post of your own)? The text comes through, but all the apostrophes and quote marks end up as random-looking symbols. The web site uses fancy publishing software to produce smart quotes and apostrophes, but your text editor doesn't understand them. This hack dumbs down these fancy typographical symbols to their ASCII equivalents.

The Code

This user script runs on all pages. It constructs an array of fancy characters (by their Unicode representation). Then, it gets a list of all the text nodes on the page and executes a search-and-replace on each node to convert each fancy character to a plain-text equivalent.

Learn more about Unicode at http://www.unicode.org.


In JavaScript, the replace method takes a regular expression object as its first parameter. For performance reasons, we build all our regular expressions first, and then reuse them every time through the loop. If we had used the inline regular expression syntax, Firefox would need to rebuild each regular expression object every time through the loop—a significant performance drain on large pages!

Save the following user script as dumbquotes.user.js:


	// ==UserScript==

	// @name		DumbQuotes

	// @namespace	http://diveintomark.org/projects/greasemonkey/

	// @description straighten curly quotes and apostrophes

	// @include		*

	// ==/UserScript==



	var arReplacements = {

		"\xa0": " ",

		"\xa9": "(c)",

		"\xae": "(r)",

		"\xb7": "*",

		"\u2018": "'",

		"\u2019": "'",

		"\u201c": '"',

		"\u201d": '"',

		"\u2026": "…",

		"\u2002": " ",

		"\u2003": " ",

		"\u2009": " ",

		"\u2013": "-",

		"\u2014": "--",

		"\u2122": "(tm)"};

   var arRegex = new Array( );

   for (var sKey in arReplacements) {

	   arRegex[sKey] = new RegExp(sKey, 'g');

   }



   var snapTextNodes = document.evaluate("//text( )[" +

	   "not(ancestor::script) and not(ancestor::style)]",

	   document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);

   for (var i = snapTextNodes.snapshotLength - 1; i >= 0; i--) {

	  var elmTextNode = snapTextNodes.snapshotItem(i);

	  var sText = elmTextNode.data;

	  for (var sKey in arReplacements) {

		  sText = sText.replace(arRegex[sKey], arReplacements[sKey]);

	  }

	  elmTextNode.data = sText;

  }


Running the Hack

Before installing the user script, go to http://www.alistapart.com/articles/emen/. As shown in Figure, the fourth paragraph reads "But the larger problem is, now that they're available, almost no one publishing on the web today knows how to use them—or often even knows of their existence." There are two fancy characters here: the apostrophe in the word they're and the dash between them and or.

Web page with fancy topography


Now, install the user script (Tools Install This User Script) and refresh the page at http://www.alistapart.com/articles/emen/. As shown in Figure, the two fancy characters have been replaced with their ASCII equivalents. The apostrophe has been converted to a straight apostrophe, and the dash has been replaced with two hyphen characters.

Web page with plain topography


Although this hack currently focuses on typographical symbols, there is nothing typography-specific about it. It's just a generic script that does global search-and-replace on the text of a web page. By altering the arReplacements array, you can replace any character, word, or phrase with anything else, on any web page. Obviously, this can lead to all sorts of mischief, if you were so inclined. I will leave this one up to your imagination….


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows