The text file I got from a colleague is full of ^M and crashes my parser running on Mac and Linux. This is due to the very unfortunate colleague is still stuck with a Windows machine, and the line break is coded differently on these systems.
I tried to download the dos2unix util for Mac and installed it smoothly, however it does not work for some reason, the ^M's are still there after running it. Bummer! Short of time, I have to roll my own solution. It is yet another one-liner:
$ cat dos2unix
#!/bin/bash
sed -i 's/^M/\n/g' $1
However the main challenge is to type in ^M character correctly inside the script. It is NOT a literal ^ and a literal M, but rather a single character.
- In vi, you type: ctrl-v then ctrl-m
- In emacs, you type: ctrl-q then ctrl-m
Enjoy!
[Update 2016-12-21]:
Coming back to the problem, upon further inspection on the problematic file, using
$ od -c filename
It turns out the line break is only represented as \r, and I guess that's why the regular dos2unix does not work, which expects \r\n. Knowing this, another fix come to mind (without having to deal with typing ^M character):
$ sed -i -e 's/\r/\n/g' filename
Also my docker infested colleague suggested another general solution for running standard dos2unix when it works, without having to download and install the dos2unix in the hosting environment:
$ docker run --rm -it -v `pwd`:/data/ alpine dos2unix /data/filename
This assumes the filename is in the `pwd`, and you have docker installed on the hosting environment. This basically launch a minimal docker machine and use the dos2unix that comes with it. However it won't solve my specific file with non traditional line break.
Enjoy!
No comments:
Post a Comment