How do you convert text to a table in python?

updated answer

I did some changes to actually make it easier to do, and to make it obivous how to actually find values.

from pprint import pprint
from datetime import datetime


def clean_text[text] -> list:
    """Removes empty lines as well as leading and trailing spaces.
    Also removes EOL characters.

    Args:
        text [str/list]: Input text

    Returns:
        [list]: A list of strings
    """
    if type[text] == str:
        splittext = text.splitlines[]
        if len[splittext] == 1:
            print["Text is a single line string"]
            return text
    elif type[text] == list:
        splittext = text
    result = []
    for line in splittext:
        cleaned = line.strip[]
        if cleaned != "":
            result.append[cleaned]
    return result


filename = 'text_sample1.txt'
with open[filename] as infile:
    text = infile.read[]



uncleaned_sections = text.split['\n \n']
sections = []
for section in uncleaned_sections:
    sections.append[clean_text[section.splitlines[]]]

for secindex, section in enumerate[sections]:
    for lineindex, line in enumerate[section]:
        print[f'sections[{secindex}][{lineindex}]: {line}']


# with the above, we have sections of data, instead of a block of data
# this means we can change the way we deal with it

# we should have 7 sections, provided that all input files are structured
# in the same way
assert[len[sections] == 7]


pd_dt = datetime.strptime[sections[1][1], '%d %B %Y']
policy_date = f'{pd_dt.day:02}/{pd_dt.month:02}/{pd_dt.year}'
abn = sections[0][2].split['ABN '][1].split['AFSL'][0].strip[]
policy_number = sections[3][3]
period_start = sections[4][1].split["From "][1].split[" to "][0].split[' '][1]
period_end = sections[4][1].split[" to "][1].split[' '][1]
insured = ' '.join[sections[2][1:]]
insurer = sections[3][1]
interest_insured = ' '.join[sections[5][1:]]

as_dict = {
    'Date': policy_date,
    'ABN': abn,
    'Policy Number': policy_number,
    'Period Start': period_start,
    'Period End': period_end,
    'Insured': insured,
    'Insurer': insurer,
    'Interest Insured': interest_insured
}

pprint[as_dict]

OUTPUT

sections[0][0]: Certificate of Currency
sections[0][1]: XYZ Limited
sections[0][2]: ABN 121 011100 54720   AFSL 81141141
sections[1][0]: As at Date
sections[1][1]: 2 November 2015
sections[1][2]: Policy Information
sections[1][3]: Policy Type
sections[1][4]: Professional
sections[2][0]: Insured
sections[2][1]: University of ABC and others as defined by the policy
sections[2][2]: document.
sections[3][0]: Insurer
sections[3][1]: MMO Limited
sections[3][2]: Policy Number[s]
sections[3][3]: L0K107721013
sections[4][0]: Period of Insurance
sections[4][1]: From 4.00pm 1/11/2015 to 4.00pm 1/11/2016
sections[5][0]: Interest Insured
sections[5][1]: Loss incurred as a result of a civil liability claim made against the insured
sections[5][2]: based solely on the insured’s provision of their professional services
sections[6][0]: Limit of Liability
sections[6][1]: $20,000,000 any one claim and $60,000,000 in the aggregate for all claims
sections[6][2]: during the period of insurance. [Subject to the reinstatement provisions of
sections[6][3]: the policy].
sections[6][4]: ABN 121 011100 54720
{'ABN': '121 011100 54720',
 'Date': '02/11/2015',
 'Insured': 'University of ABC and others as defined by the policy document.',
 'Insurer': 'MMO Limited',
 'Interest Insured': 'Loss incurred as a result of a civil liability claim '
                     'made against the insured based solely on the insured’s '
                     'provision of their professional services',
 'Period End': '1/11/2016',
 'Period Start': '1/11/2015',
 'Policy Number': 'L0K107721013'}

NOTE: PREVIOUS ANSWER

This answer is for the previous iteration of the question :-/

You need to process the lines in the text and find the correct lines to extract the data you need.

I've provided examples on how to find the values you listed in your problems. I advise you to check w3schools for the basic string methods I used...

string.split[]
string.splitlines[]
string.strip[]
f-strings [I really like f-strings]

The code below should be enough to get you started...

CODE

def clean_text[text: str] -> list:
    """Removes empty lines as well as leading and trailing spaces.
    Also removes EOL characters.

    Args:
        text [str]: Input text

    Returns:
        [list]: A list of strings
    """
    splittext = text.splitlines[]
    if len[splittext] == 1:
        print["Text is a single line string"]
        return text
    result = []
    for line in splittext:
        cleaned = line.strip[]
        if cleaned != "":
            result.append[cleaned]
    return result


text = " \n \nCertificate of Currency \nXYZ Limited \nABN 121011100 54720   AFSL 232111 \n \nAs at Date \n2 November 2015 \nPolicy Information \nPolicy Type \nProfessional  \n \n \nInsured \nUniversity of ABC and others as defined by the policy \ndocument. \n \nInsurer \nMMO Limited   \n                                                    \nPolicy Number[s] \L0K107721013  \n \nPeriod of Insurance \nFrom 4.00pm 1/11/2015 to 4.00pm 1/11/2016 \n \nInterest Insured \nLoss incurred as a result of a civil liability claim made against the insured \nbased solely on the insured’s provision of their professional services  \n \nLimit of Liability \n$20,000,000 any one claim and $60,000,000 in the aggregate for all claims "
text = clean_text[text]
# I alsways add this when using the 'previous_line' method below
# it can reduce failures
text.append['']


previous_line = ""
for line in text:
    # two \ needed due to it being a special character in strings
    if "Policy Number[s]" in previous_line:
        policy_number = line
    elif "From " in line and "Period of Insurance" in previous_line:
        # this is a secondary check for the start/end dates
        # Just in case another line in the text contains 'From '
        start = line.split["From "][1].split[" to "][0]
        end = line.split[" to "][1]
    elif "ABN " and "AFSL " in line:
        # using different method than the splits above
        abn_split = line.split[]
        abn_value = f"{abn_split[1]} {abn_split[2]}"

    previous_line = line

# Some try/except blocks to check if the values have been found
try:
    print[f"Start: {start}\nEnd: {end}"]
except NameError as e:
    print["Start/End dates not found"]

try:
    print[f"ABN: {abn_value}"]
except NameError as e:
    print["ABN not found"]

try:
    print[f"Policy Number: {policy_number}"]
except NameError as e:
    print["Policy number not found"]

OUTPUT

Start: 4.00pm 1/11/2015
End: 4.00pm 1/11/2016
ABN: 121011100 54720
Policy Number: L0K107721013  

How do you convert data into a table in Python?

How to Easily Create Tables in Python.
install tabulate. We first install the tabulate library using pip install in the command line: pip install tabulate..
import tabulate function. ... .
list of lists. ... .
dictionary of iterables. ... .
missing values..

How do I turn a text file into a table?

Select the text that you want to convert, and then click Insert > Table > Convert Text to Table. In the Convert Text to Table box, choose the options you want. Under Table size, make sure the numbers match the numbers of columns and rows you want.

How do I convert a TXT to a DataFrame in Python?

Methods to convert text file to DataFrame.
read_csv[] method..
read_table[] function..
read_fwf[] function..

Chủ Đề