python处理文件内容的正确姿势该怎样?
问题描述
大神们:
我想把htm文件中的第一个<link到第二个<link之间的所有内容另存为一个htm该怎么写比较简洁。
<meta http-equiv='X-UA-Compatible' content='IE=edge'><link rel='prefetch' href='https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js'><meta name='application-name' content='Python.org'><meta name='msapplication-tooltip' content='The official home of the Python Programming Language'><meta name='apple-mobile-web-app-title' content='Python.org'><meta name='apple-mobile-web-app-capable' content='yes'><meta name='apple-mobile-web-app-status-bar-style' content='black'><meta name='viewport' content='width=device-width, initial-scale=1.0'><meta name='HandheldFriendly' content='True'><meta name='format-detection' content='telephone=no'><meta http-equiv='cleartype' content='on'><meta http-equiv='imagetoolbar' content='false'><script type='text/javascript' async='' src='https://ssl.google-analytics.com/ga.js'></script><script src='https://www.haobala.com/wenda/Welcome to Python.org_files/modernizr.js.下载'></script><style type='text/css' adt='123'></style><link href='https://www.haobala.com/wenda/Welcome to Python.org_files/style.css' rel='stylesheet' type='text/css'><link href='https://www.haobala.com/wenda/Welcome to Python.org_files/mq.css' rel='stylesheet' type='text/css' media='not print, braille, embossed, speech, tty'>
提取的内容应该是:
<link rel='prefetch' href='https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js'><meta name='application-name' content='Python.org'><meta name='msapplication-tooltip' content='The official home of the Python Programming Language'><meta name='apple-mobile-web-app-title' content='Python.org'><meta name='apple-mobile-web-app-capable' content='yes'><meta name='apple-mobile-web-app-status-bar-style' content='black'><meta name='viewport' content='width=device-width, initial-scale=1.0'><meta name='HandheldFriendly' content='True'><meta name='format-detection' content='telephone=no'><meta http-equiv='cleartype' content='on'><meta http-equiv='imagetoolbar' content='false'><script type='text/javascript' async='' src='https://ssl.google-analytics.com/ga.js'></script><script src='https://www.haobala.com/wenda/Welcome to Python.org_files/modernizr.js.下载'></script><style type='text/css' adt='123'></style><link
问题解答
回答1:import retext = ''with open('read.html', 'r') as rf: text = rf.read() pattern = r'<link[sS]*?<link'results = re.findall(pattern, text)if results: r = results[0] with open('write.html', 'w') as wf:wf.write(r) ================================================with open('read.html', 'r') as rf: with open('write.html', 'w') as wf:num = 0for line in rf.readlines(): if line.startswith('<link'):num += 1continue if num == 2:break wf.writelines(line)
相关文章:
1. css3 - 请问一下在移动端CSS布局布局中通常需要用到哪些元素,属性?2. PHP能实现百度网盘的自动化么?3. 网页爬虫 - python requests爬虫,如何post payload4. android - 使用vue.js进行原生开发如何进行Class绑定5. node.js - vue服务端渲染如何部署到线上6. node.js - 微信的自动回复问题7. MySQL 水平拆分之后,自动增长的ID有什么好的解决办法?8. javascript - 百度图片切换图片时url会改变,但无刷新,没用hash,IE8也支持,请问是用了什么技术?9. 我正在使用jsp / jstl / spring动态生成css和js。如何将此结果放置在头部的链接标签中。不在头部的脚本标签中10. angular.js - 各位大神们,你们混合开发,web方式中更推荐用什么框架呀? react?vue?angular?谢谢~
