June 16, 2011, 6:02 p.m.
posted by dm
A Line Break Is a Line Break
A line break is a line break is a line break, except when it's not. Surprisingly, there are three different types of line breaks in the modern computing world, and OS X uses two of the three.
One might think the innocent line break, that docile whitespace that tells us when paragraphs begin and end, would be a relatively simple piece of computer engineering. Unfortunately, there's more to the line break than meets the eye.
There are three different types of line breaks, all originally unique to the major operating systems: Windows/DOS, Macintosh, and Unix. A document using Mac line breaks would look horrid on a Windows system, and a document using Windows line breaks on Unix also wouldn't be interpreted correctly. The cause for this is how the line break is actually created. The Mac, by default, uses a single carriage return (<CR>), represented as \r. Unix, on the other hand, uses a single linefeed (<LF>), \n. Windows goes one step further and uses both, creating a (<CRLF>) combination, \r\n.
To make matters still more interesting, until OS X came along, OS-specific line breaks stayed in their own environment and didn't play nicely with others. Windows understood only its brethren, Unix cackled madly at anything else, and the Mac just grinned knowingly. OS X, however, understands both the original Mac line break and Unix line breaks.
This can cause confusion very easily, especially considering that most Mac applications (i.e., most anything that runs through the GUI of OS X) read and save using Mac-style line breaks, while anything used through the Terminal (like the common text editors [Hack #51]: vi, pico, and Emacs) enforces the Unix variety.
Thankfully, it's pretty easy to solve problems caused by this dual mentality. The first step is identifying that you have an issue. Say you have a text file you saved with SimpleText or a default installation of BBEdit. If you try to open that file in a shell editor like vi, you'll see this instead of what you'd expect:
This should be line one.^MThis should be on line two.
See that ugly ^M character stuck in the middle of our two sentences? That's the best vi (and most Unix applications) can do in an attempt to display a Mac linefeed. Likewise, if you open a text file crafted in vi with SimpleText, you'll see square boxes where there should be line breaks. Obviously, this wreaks havoc with any attempt at poetry — or system administration, for that matter.
There are a few solutions, depending on your skills and desires. The most obvious is to change your text editor to match what you'll be needing most frequently. If you're constantly going to be writing files that will be used in the shell, then set your text editor to save as Unix linefeeds. A must-have editor, BBEdit (http://www.barebones.com/) from Bare Bones Software, allows you to do this quite easily, both on a file-by-file basis (see Figure) and globally through BBEdit's ultraconfigurable preferences (see Figure).
If Terminal-based text editors are more your cup of tea, a stronger version of vi called vim (for vi, improved) is flexible and infinitely configurable when it comes to editing files of varying formats. http://vim.sourceforge.net/htmldoc/usr_23.html provides more than enough detail on choosing your own line break.
If you want a less permanent option, a single command line can save you some hassle. Here, we've listed two simple Perl one-liners. The first translates Mac linefeeds to their Unix equivalent, and the second does the reverse. You'll notice that the linefeeds are represented by the same characters we mentioned before:
perl -pi -e 's/\r/\n/g' file_with_mac_linefeeds.txt perl -pi -e 's/\n/\r/g' file_with_unix_linefeeds.txt
perl -pi -e 's/\r\n/\n/g' file_with_win_linefeeds.txt perl -pi -e 's/\r\n/\r/g' file_with_win_linefeeds.txt