Wednesday, February 12, 2014

Python triple-quoted strings vs. raw strings

Python lets you include text as strings in a number of ways, but picking the right one is important. There are two specific types of strings that I get confused and occasionally need a little reference to sort out: triple-quoted strings and raw strings.

A triple-quoted string has three quotation marks and looks like this:

TRIPLE = """first\nsecond
third"""


whereas a raw string has a preceding letter r and looks like this:

RAWSTRING = r"I want some \nicely\ formatted text"

The triple-quoted string preserves everything in it, including newline characters, and it interprets the backslashes as "escaped" characters like in regular strings. It will include anything but another triple quote, which it interprets as the end. If you printed TRIPLE it would look like this:

first
second
third


Raw strings work a little differently. A raw string won't let you break the string over a line in the middle, and it doesn't interpret escaped characters. It will not convert the "\n" to a newline like ordinary strings or triple-quoted strings will. So if you printed RAWSTRING you would see this:

I want some \nicely\ formatted text

Triple-quoting is very convenient for copying data from somewhere and pasting it into a string when you want to preserve the newlines. Raw srtings are useful for code-related things that may contain backslashes for other reasons, such as regular expressions.

I can use a triple-quoted string to pull the list of users directly from an email and stick it into a string in a Python script, and then I can use a raw string to create a regular expression to pull the usernames out:


#!/usr/bin/env/ python3

import re

DATA = """Holly Martins (HMARTINS)
Anna Schmidt (ASCHMIDT)
Harry Lime (HLIME)"""



REGEX = r"\s\((\w+)\)"

names = re.findall(REGEX,DATA)
names.sort()
for name in names:
     print(name)

Result:

ASCHMIDT
HLIME
HMARTINS

1 comment:

  1. One more thing: a triple quoted string can also be raw, as in

    r"""..."""

    This should treat the backslashes within the triple quoted string as ordinary characters again, and is in fact useful for forming VERBOSE regular expressions, with comments embedded right inside the regular expression. (See document for the re package.)

    ReplyDelete