Monday, December 12, 2016

dos2unix

The text file I got from a colleague is full of ^M and crashes my parser running on Mac and Linux. This is due to the very unfortunate colleague is still stuck with a Windows machine, and the line break is coded differently on these systems.

I tried to download the dos2unix util for Mac and installed it smoothly, however it does not work for some reason, the ^M's are still there after running it. Bummer! Short of time, I have to roll my own solution. It is yet another one-liner:

$ cat dos2unix #!/bin/bash sed -i 's/^M/\n/g' $1

However the main challenge is to type in ^M character correctly inside the script. It is NOT a literal ^ and a literal M, but rather a single character.

  • In vi, you type: ctrl-v then ctrl-m
  • In emacs, you type: ctrl-q then ctrl-m
ctrl-v means you hold down the ctrl key and v key at the same time.

Enjoy!

[Update 2016-12-21]:

Coming back to the problem, upon further inspection on the problematic file, using $ od -c filename It turns out the line break is only represented as \r, and I guess that's why the regular dos2unix does not work, which expects \r\n. Knowing this, another fix come to mind (without having to deal with typing ^M character): $ sed -i -e 's/\r/\n/g' filename Also my docker infested colleague suggested another general solution for running standard dos2unix when it works, without having to download and install the dos2unix in the hosting environment: $ docker run --rm -it -v `pwd`:/data/ alpine dos2unix /data/filename

This assumes the filename is in the `pwd`, and you have docker installed on the hosting environment. This basically launch a minimal docker machine and use the dos2unix that comes with it. However it won't solve my specific file with non traditional line break.

Enjoy!

No comments:

Post a Comment