In order to convert TXT to XML, we will use Aspose.Cells for Python API which is a feature-rich, powerful and easy to use document manipulation and conversion API for Python platform.
Steps to Convert TXT to XML via Python
Python developers can easily load & convert TXT files to XML in just a few lines of code.
- Load TXT file with an instance of Workbook
- Call the Workbook.Save method
- Pass output path with XML extension as parameter
- Check specified path for resultant XML file
System Requirements
Aspose.Cells for Python is platform-independent API and can be used on any platform [Windows, Linux and MacOS], just make sure that system have Java 1.8 or higher, Python 3.5 or higher.
- Install Java and add it to PATH
environment variable, for example:
PATH=C:\Program Files\Java\jdk1.8.0_131;
. - Install Aspose.Cells for Python from pypi, use command as:
$ pip install aspose-cells
.
Free App and Sample Code to Convert TXT to XML
import jpype
import asposecells
jpype.startJVM[]
from asposecells.api import Workbook
workbook = Workbook["Input.xlsx"]
workbook.Save["Output.pdf"]
jpype.shutdownJVM[]
An Excel Spreadsheet Programming Library capable of building cross-platform applications with the ability to generate, modify, convert, render and print all Excel files. Python Excel API not only convert between spreadsheet formats, it can also render Excel files as images, PDF, HTML, ODS, CSV, SVG, JSON, WORD, PPT and more, thus making it a perfect choice to exchange documents in industry-standard formats.
TXT What is TXT File Format
A file with .TXT extension represents a text document that contains plain text in the form of lines. Paragraphs in a text document are recognized by carriage returns and are used for better arrangement of file contents. A standard text document can be opened in any text editor or word processing application on different operating systems. All the text contained in such a file is in human-readable format and represented by sequence of characters.
Read More
XML What is XML File Format
XML stands for Extensible Markup Language that is similar to HTML but different in using tags for defining objects. The whole idea behind creation of XML file format was to store and transport data without being dependent on software or hardware tools. Its popularity is due to it being both human as well as machine readable. This enables it to create common data protocols in the form of objects to be stored and shared over network such as World Wide Web [WWW]. The "X" in XML is for extensible which implies that the language can be extended to any number of symbols as per user requirements. It is for these features that many standard file formats make use of it such as Microsoft Open XML, LibreOffice OpenDocument, XHTML and SVG.
Read More
Other Supported Conversions
You can also convert TXT into many other file formats including few listed below.
Here is a better method of splitting the lines.
Notice that the text
variable would technically be your .txt
file, and that I purposely modified it so that we have a greater context of the output.
from collections import OrderedDict
from pprint import pprint
# Text would be our loaded .txt file.
text = """Serial Number: test Operator ID: test1 Time: 00:03:47 Test Step 1 TP1: 17.25 TP2: 2.46
Serial Number: Operator ID: test2 Time: 00:03:48 Test Step 2 TP1: 17.24 TP2: 2.47"""
# Headers of the intended break-points in the text files.
headers = ["Serial Number:", "Operator ID:", "Time:", "TP1:", "TP2:"]
information = []
# Split our text by lines.
for line in text.split["\n"]:
# Split our text up so we only have the information per header.
default_header = headers[0]
for header in headers[1:]:
line = line.replace[header, default_header]
info = [i.strip[] for i in line.split[default_header]][1:]
# Compile our header+information together into OrderedDict's.
compiled_information = OrderedDict[]
for header, info in zip[headers, info]:
compiled_information[header] = info
# Append to our overall information list.
information.append[compiled_information]
# Pretty print the information [not needed, only for better display of data.]
pprint[information]
Outputs:
[OrderedDict[[['Serial Number:', 'test'],
['Operator ID:', 'test1'],
['Time:', '00:03:47 Test Step 1'],
['TP1:', '17.25'],
['TP2:', '2.46']]],
OrderedDict[[['Serial Number:', ''],
['Operator ID:', 'test2'],
['Time:', '00:03:48 Test Step 2'],
['TP1:', '17.24'],
['TP2:', '2.47']]]]
This method should generalize better than what you are currently writing, and the idea of the code is something I've had saved from another project. I recommend you going through the code and understanding its logic.
From here you should be able to loop through the information
list and create your custom .xml
file. I would recommend you checking out dicttoxml
as well, as it might make your life much easier on the final step.
In regards to your code, remember: breaking down fundamental tasks is easier than trying to incorporate them all into one. By trying to create the xml
file while you split your
txt
file you've created a monster that is hard to tackle when it revolts back with bugs. Instead, take it one step at a time -- create "checkpoints" that you are 100% certain work, and then move on to the next task.