Python split ignore escape character

I really know this is an old question, but i needed recently an function like this and not found any that was compliant with my requirements.

Rules:

  • Escape char only works when used with escape char or delimiter. Ex. if delimiter is / and escape are \ then [\a\b\c/abc bacame ['\a\b\c', 'abc']
  • Multiple escapes chars will be escaped. [\\ became \]

So, for the record and if someone look anything like, here my function proposal:

def str_escape_split[str_to_escape, delimiter=',', escape='\\']:
    """Splits an string using delimiter and escape chars

    Args:
        str_to_escape [[type]]: The text to be splitted
        delimiter [str, optional]: Delimiter used. Defaults to ','.
        escape [str, optional]: The escape char. Defaults to '\'.

    Yields:
        [type]: a list of string to be escaped
    """
    if len[delimiter] > 1 or len[escape] > 1:
        raise ValueError["Either delimiter or escape must be an one char value"]
    token = ''
    escaped = False
    for c in str_to_escape:
        if c == escape:
            if escaped:
                token += escape
                escaped = False
            else:
                escaped = True
            continue
        if c == delimiter:
            if not escaped:
                yield token
                token = ''
            else:
                token += c
                escaped = False
        else:
            if escaped:
                token += escape
                escaped = False
            token += c
    yield token

For the sake of sanity, i'm make some tests:

# The structure is:
# 'string_be_split_escaped', [list_with_result_expected]
tests_slash_escape = [
    ['r/casa\\/teste/g', ['r', 'casa/teste', 'g']],
    ['r/\\/teste/g', ['r', '/teste', 'g']],
    ['r/[[[0-9]]\\s+-\\s+[[0-9]]]/\\g\\g/g',
     ['r', '[[[0-9]]\\s+-\\s+[[0-9]]]', '\\g\\g', 'g']],
    ['r/\\s+/ /g', ['r', '\\s+', ' ', 'g']],
    ['r/\\.$//g', ['r', '\\.$', '', 'g']],
    ['u///g', ['u', '', '', 'g']],
    ['s/[/[/g', ['s', '[', '[', 'g']],
    ['s/]/]/g', ['s', ']', ']', 'g']],
    ['r/[\\.]\\1+/\\1/g', ['r', '[\\.]\\1+', '\\1', 'g']],
    ['r/[?

Chủ Đề