I took a csv export from mssql. I had to read this file in python. However there was a problem. Python was reading it in binary format.
file.readlines read it like this:
'\xff\xfe2\x000\x001\x003\x00-\x001\x000\x00-\x001\x000\x00 \x000\x000\x00:\x000\x002\x00:\x000\x000\x00,\x00i\x00n\x00s\x00t\x00a\x00l\x00l\x00 \x002\x000\x001\x003\x00,\x00t\x00o\x00o\x00k\x00 \x00r\x00a\x00\r\x00\n'
When i used file command on that file I got:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators
file.readlines read it like this:
'\xff\xfe2\x000\x001\x003\x00-\x001\x000\x00-\x001\x000\x00 \x000\x000\x00:\x000\x002\x00:\x000\x000\x00,\x00i\x00n\x00s\x00t\x00a\x00l\x00l\x00 \x002\x000\x001\x003\x00,\x00t\x00o\x00o\x00k\x00 \x00r\x00a\x00\r\x00\n'
When i used file command on that file I got:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators
When I tried to use dos2unix command I got this error:
dos2unix: Binary symbol 0x000B found at line 9419305
dos2unix: Skipping binary file calls.csv
So I tried iconv command to convert the file to urf-8
iconv -f utf-16 -t utf-8 input_file > output_file
And it worked! Now python reads it properly.
No comments:
Post a Comment