其他 环境 异常 汇总 合并文件夹下所有xml并提取链接 小柯 2022-12-19 2025-05-18 前言 output_file
:指定存储结果的文件名和路径。xml_directory
:指定要读取的所有 XML 文件所在的目录。download_links
:创建一个空列表来存储所有找到的链接。file_path
:构建完整的文件路径。content
:读取文件内容。links_found
:使用正则表达式从文件内容中提取所有包含 ‘/downloads/‘ 的链接。
合并所有xml并保存至txt import osimport reoutput_file = 'download_links.txt' xml_directory = r'C:\Users\26370\Desktop\新建文件夹' download_links = [] for filename in os.listdir(xml_directory): if filename.endswith('.xml' ): file_path = os.path.join(xml_directory, filename) with open (file_path, 'r' , encoding='utf-8' ) as file: content = file.read() links_found = re.findall(r'<loc>https?://[^\s]+/downloads/[^\s]+/</loc>' , content) download_links.extend(links_found) download_links = list (set (download_links)) with open (output_file, 'w' , encoding='utf-8' ) as output: for link in download_links: cleaned_link = re.sub(r'<[^>]+>' , '' , link) output.write(cleaned_link + '\n' ) print (f"共找到 {len (download_links)} 个包含 '/downloads/' 的链接,已保存到文件:{output_file} " )