Operating systems handle the line endings in text files in different ways. For DOS and Microsoft Windows, the end of a line is marked by a carriage return (CR) and a line feed (LF) character.
The CR and LF characters were used originally on teletypewriters, aka teleprinters, which were electromechanical typewriters used for telecommunications or to control early computers. Though, later, the carriage return would usually move the paper in the device to the next line as well, initially it would cause the cylinder on which the paper was held (the carriage) to return to the left side of the paper after a line of text had been typed without advancing the paper to a new line. Today, the return key you see on a computer's keyboard is a descendant of the carriage return on the earlier teletype machines. In most word processors today, hitting the return key will move the cursor to the beginning of the next line.
If you are working on a text file, e.g. one with a .txt extension, on a DOS or Microsoft Windows system, when you hit the return key two characters are inserted in the file at that point, a carriage return (CR) character followed by a line feed character, which have the following hexadecimal representations.
Description | Hex |
---|---|
Carriage Return (CR) | 0D |
Line Feed (LF) | 0A |
But, if you are working on a Linux or Unix system, then only the LF character is inserted at the end of a line when you hit return. This may be due to a desire to reduce disk storage space for text files on early Unix computers; disk storage was much more limited than it is today.
Mac systems use yet another convention with OS X, even though it is a Unix-based operating system, with a heritage in BSD Unix . They use just the CR character to mark the end of a line.
OS | Newline | Hexadecimal |
---|---|---|
DOS/Windows | CRLF | 0D 0A |
Linux/Unix | LF | 0A |
Mac OS/OS X | CR | 0D |
So most Mac applications will, when you save a file as a text file, put just a CR at the end of the line. However, if you are editing a file from the command line on a Mac OS X system with a program, such as Vi, which is an editor that comes with Mac OS X, but which was originally developed for Unix, it will save a file with the LF (hex 0A) character at the end of lines.
E.g., I can create a text file test.txt
with vi and put just
the following two lines in it:
123
456
If I examine the contents of the file with the od program, I see the
following, if I use the -c
option to display
ASCII
characters or backslash escapes:
GS01:Documents jsmith$ od -c test.txt 0000000 1 2 3 \n 4 5 6 \n 0000010
The \n
at the end of each line represents a newline
-ax
to see the
ASCII
and hexadecimal contents of the file, I see the following:
GS01:Documents jsmith$ od -ax test.txt 0000000 1 2 3 nl 4 5 6 nl 3231 0a33 3534 0a36 0000010
I see that the lines are terminated with the hexidecimal 0A character
for the newline character. Note: the hexadecimal representation that
appears below the ASCII representation has the bytes reversed, i.e.
32
represents 2
and 31
represents
1
.
If you need to convert a file that uses the Mac style of terminating lines with a CR character to the Linux/Unix style of using a LF character, then you can use the following procedure within vi taken from Using the shell (Terminal) in Mac OS X.
Type "1,$s/" and then press CTRL-V followed by CTRL-M. When you press CTRL-V nothing appears to happen, but the CTRL-M shows up as "^M". Continue with "/" and then CTRL-V again. Hit RETURN (which will show up as ^M and you could do that too - I just like it this way) and finally "/g". On your screen the whole thing looks like:
:1,$s/^M/^M/gWhat does that mean? It means "Starting at line 1 and stopping at the end of the file (1,$), substitute (s) any CTRL-M (/^M/) with Unix CTRL-M (^M/) and do it for the entire line rather than just the first CTRL-M you find (g) (On most other Unixes I'd just do s/^M//g ; I don't know why Mac OS X didn't let me do that). It is a little strange that you replace ^M with ^M but get something entirely different, but that's a subject for another day. The morbidly curious can start by typing "man stty" if they need to know now.
You can then use wq
to save the file under the same name
or wq newfilename.txt
to give the converted version a new name.
Or, alternatively, if you don't want to use the vi editor, you can use the following:
cat file1 | tr "\\r" "\\n" > file2
That will use the translate, i.e. tr
, command to translate
all instances of the carriage return character, represented by \r
to the newline character, in this case the LF character used on Unix systems.
If you wish, you could also create a script, e.g., mac2unix to perform the translation:
test $# -eq 2 -a "$1" != "$2" && tr "\015" "\012" < $1 > $2 ||
echo "Usage: mac2unix f1 f2"
After changing the permissions on the file with chmod 755
mac2unix
, you could use mac2unix file1 file
to convert the
contents of file1 to file2.
I receive email messages from a Unix system that contain
gpg encrypted data on a Mac OS
X system. If I try to decrypt them with gpg --decrypt file1.gpg
>file2.txt
on the Mac system, I receive the error message
gpg: [don't know]: invalid packet (ctb=53)
. So I first need
to convert file1
with this procedure before running gpg to
decrypt it.
If you needed to convert a file on a Mac system to the text format for
a DOS or Microsoft Windows system, you could create a script, e.g.
mac2dos
to perform the conversion:
test $# -eq 2 -a "$1" != "$2" && { mac2unix $1 $2; unix2dos $2 $2 } || echo "Usage: mac2dos f1 f2"
That script would rely on the mac2unix
script you created
previously.
To go the other way, e.g. from DOS/Windows to the Mac text format or from Unix to the MAC format, you could use the following:
dos2mac
test $# -eq 2 -a "$1" != "$2" && tr -d "\012" < $1 > $2 || echo
"Usage: dos2mac f1 f2"
unix2mac
test $# -eq 2 -a "$1" != "$2" && tr "\012" "\015" < $1 > $2 ||
echo "Usage: unix2mac f1 f2"
References
-
Carriage return
Wikipedia, the free encyclopedia -
Newline
Wikipedia, the free encyclopedia -
Teleprinter
Wikipedia, the free encyclopedia -
Using the shell
(Terminal) in Mac OS X
Date: December 2002
MacOSX articles at APLawrence.com -
Vi
Wikipedia, the free encyclopedia -
Line Breaks
Date: July 1, 2003
By: Rodney Sparapani/Medical College of Wisconsin
The ESS-help Archives -
Why is the line terminator CR+LF?
Date: March 18, 2004
By: oldnewthing
The Old New Thing