python2.7 - python 中文写入文件后乱码
问题描述
一个很简单的小爬虫程序
for i in L:content = urllib2.urlopen(’http://X.X.X.X/cgi-bin/GetDomainOwnerInfo?domain=%s’ %i)html = content.read()with open(’domain_test.xml’,’a’) as f: f.write(html) print html
print 的结果是中文:
<domaininfo strDomain='XXX.com.' strOwner='XXX' strDepartment='云平台部' strBusiness='[互联网业务系统 - XXX' strUser='XXX;'>
但直接打开xml文本的时候却是乱码:
<domaininfo strDomain='XXX.com.' strOwner='XXX' strDepartment='云平å°éƒ¨' strBusiness='[互è”网业务系统 - 第三方应用]' StrUser='XXX;'>
Windows 7 操作系统,python 2.7
请问一下各位,这个问题如何解决?
问题解答
回答1:你需要知道 content 的编码方式,并考虑是否要转换
你需要用 utf-8 打开文件,然后写入
codecs.open(filename, mode[, encoding[, errors[, buffering]]])
Open an encoded file using the given mode and return a wrapped versionproviding transparent encoding/decoding. The default file mode is ’r’meaning to open the file in read mode.
Note The wrapped version will only accept the object format defined bythe codecs, i.e. Unicode objects for most built-in codecs. Output isalso codec-dependent and will usually be Unicode as well. Note Filesare always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using8-bit values. This means that no automatic conversion of ’n’ is doneon reading and writing. encoding specifies the encoding which is to beused for the file.errors may be given to define the error handling. It defaults to’strict’ which causes a ValueError to be raised in case an encodingerror occurs.buffering has the same meaning as for the built-in open() function. Itdefaults to line buffered.
import codecsf = codecs.open('domain_test.xml', 'w', 'utf-8')回答2:
试试在文件开头加上 # -*- coding: utf-8 -*-
回答3:在文件开头加上 #coding:utf-8
相关文章:
1. css3 - 微信前端页面遇到的transition过渡动画的bug2. 网页爬虫 - 关于Python的编码与解码问题3. css - 文字排版问题,内容都是动态抓出来的,字数不一定。如何对齐啊4. javascript - router.push无效5. javascript - 为什么var obj = {}创建对象的方法里面不能用this.xxx来声明属性 ?6. css3 - 微信小程序如何把radio改成2个选择按钮的样式7. selenium-selenium-webdriver - python 将当前目录加入到 环境变量8. css - input间的间距和文字上下居中9. 为什么javascript 18014398501093363 == 18014398501093364 返回ture10. javascript - 浏览器回退,如何保证js对dom的操作保存下来

网公网安备