Fix Incorrect Filename Encoding
My friend asked me some help to fix incorrect filename encoding. By default, most operating systems automatically create filename in utf-8
encoding. That's the correct and good one. However, in some case, applications may create filename in other encoding, e.g., TIS-620 for Thai. The result is unreadable filenames. Actually, this situation often occurs when you download files and create the same name on local computer. The encoding maybe ISO-8859-1. It is possible to prevent this problem by adjusting configuration of those applications. By the way, you are about to fix this problem since it is already happened.
Fortunately, Python has all functions required for fixing this problem easily. I wrote a simple and short script as follows.
#!/usr/bin/env python import sys import os from optparse import OptionParser class Application: def __init__(self): self.parse_args() def parse_args(self): parser = OptionParser(usage='usage: %prog [options] path ...') parser.add_option('-v','--verbose',dest='verbose',default=False, action="store_true",help='verbose') parser.add_option('-i','--in',dest='in_encoding',default='utf-8', help='input encoding (default=utf-8)') parser.add_option('-o','--out',dest='out_encoding',default='tis-620', help='input encoding (default=tis-620)') parser.add_option('-r','--recursive',dest='recursive',default=False, action="store_true",help='recursive') parser.add_option('--dryrun',dest='dryrun',default=False, action="store_true",help='dry run') self.options,self.args = parser.parse_args() def verbose(self,message): if self.options.verbose: print message def error(self,message): print >> sys.stderr,'ERROR: %s' % message def run(self): for path in self.args: self.process(path) def process(self,path): if self.options.recursive: for root,dirs,files in os.walk(path,topdown=False): for name in files: self.rename(os.path.join(root,name)) for name in dirs: self.rename(os.path.join(root,name)) self.rename(path) def rename(self,path): path = os.path.realpath(path) dir = os.path.dirname(path) src = os.path.basename(path) dest = src.decode(self.options.in_encoding,'replace').encode(self.options.out_encoding,'replace') src = os.path.join(dir,src) dest = os.path.join(dir,dest) self.verbose("rename '%s' to '%s'" % (src,dest)) try: if not self.options.dryrun: os.rename(src,dest) except OSError,why: self.error(str(why)) if __name__ == '__main__': app = Application() app.run()
To make it useful as much as possible, I wrote it with a few options.
usage: fixencoding.py [options] path ...
options:
-h, --help show this help message and exit
-v, --verbose verbose
-i IN_ENCODING, --in=IN_ENCODING
input encoding (default=utf-8)
-o OUT_ENCODING, --out=OUT_ENCODING
output encoding (default=tis-620)
-r, --recursive recursive
--dryrun dry run
For example, you can recursively convert utf-8
to tis-620
in C:\Downloads
as follow.
python fixencoding.py -i utf-8 -o tis-620 -r C:\Downloads
If you don't sure the input encoding or output encoding, you may just try it using --dryrun
and --verbose
.
python fixencoding.py -i utf-8 -o tis-620 -r -v --dryrun C:\Downloads
- sugree's blog
- 2055 reads
Thanks!
changed geshifilter
Post new comment