Page 1 of 1
When Filenames Attack
Posted: Sun Aug 22, 2004 4:59 pm
by skanks
So my brother was in Russia and he has lots of files with Cyrillic filenames. One of these was a zip file which both Winzip and Windows couldn't open claiming that the file was corrupted or invalid. After I changed the filename to "a.zip" it unzipped fine.
Why does the filename have any effect on the percieved viability of the file? Shouldn't the filesystem manage those details and shouldn't they be transparent to the application?
Posted: Sun Aug 22, 2004 8:32 pm
by VLSmooth
Heh, I run into this problem a lot with non shift-JIS encoded Japanese files.
Renaming became a habit, hence I never put much thought into it. To hazard a guess, I'm guessing it's an early bailout mechanism adopted by most (de)compression programs for the "extract to folder..." option. If it can't read the filename, how can it make a directory with it, etc.
Posted: Mon Sep 13, 2004 7:37 am
by bob
The problem could be stemming from the kind of strings the program uses and supports. If it's using standard ASCII strings, then I doubt the standard-C fread and fwrite functions are going to enjoy Cyrillic or Japanese filenames. If, however, it uses UTF-8 strings for the filenames, and whatever OS-level function supports file handling with UTF-8 string names, then it should work fine.
UTF-8 is a pretty cool method of storing unicode. Each byte is a standard ASCII character, for all values less than or equal to 127. Anything above that (bit 7 set), it reads multiple bytes for the character value and grows the value by 7 bits at a time, until it reaches a byte with the high bit cleared. I was considering a similar scheme earlier today, since I decided to do a little research into developing some simple lossless audio compression.