PEP 3126 – Remove Implicit String Concatenation

PEP: 3126
Title: Remove Implicit String Concatenation
Author: Jim J. Jewett <JimJJewett at gmail.com>, Raymond Hettinger <python at rcn.com>
Status: Rejected
Type: Standards Track
Created: 29-Apr-2007
Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007

Rejection Notice
Abstract
Motivation
- History or Future
- Problem
Solution
Concerns
Transition
Open Issues
References
Copyright

Rejection Notice

This PEP is rejected. There wasn’t enough support in favor, the feature to be removed isn’t all that harmful, and there are some use cases that would become harder.

Abstract

Python inherited many of its parsing rules from C. While this has been generally useful, there are some individual rules which are less useful for python, and should be eliminated.

This PEP proposes to eliminate implicit string concatenation based only on the adjacency of literals.

Instead of:

"abc" "def" == "abcdef"

authors will need to be explicit, and either add the strings:

"abc" + "def" == "abcdef"

or join them:

"".join(["abc", "def"]) == "abcdef"

One goal for Python 3000 should be to simplify the language by removing unnecessary features. Implicit string concatenation should be dropped in favor of existing techniques. This will simplify the grammar and simplify a user’s mental picture of Python. The latter is important for letting the language “fit in your head”. A large group of current users do not even know about implicit concatenation. Of those who do know about it, a large portion never use it or habitually avoid it. Of those who both know about it and use it, very few could state with confidence the implicit operator precedence and under what circumstances it is computed when the definition is compiled versus when it is run.

History or Future

Many Python parsing rules are intentionally compatible with C. This is a useful default, but Special Cases need to be justified based on their utility in Python. We should no longer assume that python programmers will also be familiar with C, so compatibility between languages should be treated as a tie-breaker, rather than a justification.

In C, implicit concatenation is the only way to join strings without using a (run-time) function call to store into a variable. In Python, the strings can be joined (and still recognized as immutable) using more standard Python idioms, such + or "".join.

Problem

Implicit String concatenation leads to tuples and lists which are shorter than they appear; this is turn can lead to confusing, or even silent, errors. For example, given a function which accepts several parameters, but offers a default value for some of them:

def f(fmt, *args):
    print fmt % args

This looks like a valid call, but isn’t:

>>> f("User %s got a message %s",
      "Bob"
      "Time for dinner")

Traceback (most recent call last):
  File "<pyshell#8>", line 2, in <module>
    "Bob"
  File "<pyshell#3>", line 2, in f
    print fmt % args
TypeError: not enough arguments for format string

Calls to this function can silently do the wrong thing:

def g(arg1, arg2=None):
    ...

# silently transformed into the possibly very different
# g("arg1 on this linearg2 on this line", None)
g("arg1 on this line"
  "arg2 on this line")

To quote Jason Orendorff [#Orendorff]

Oh. I just realized this happens a lot out here. Where I work, we use scons, and each SConscript has a long list of filenames:
sourceFiles = [
    'foo.c'
    'bar.c',
    #...many lines omitted...
    'q1000x.c']
It’s a common mistake to leave off a comma, and then scons complains that it can’t find ‘foo.cbar.c’. This is pretty bewildering behavior even if you are a Python programmer, and not everyone here is.

Solution

In Python, strings are objects and they support the __add__ operator, so it is possible to write:

"abc" + "def"

Because these are literals, this addition can still be optimized away by the compiler; the CPython compiler already does so. 2

Other existing alternatives include multiline (triple-quoted) strings, and the join method:

"""This string
   extends across
   multiple lines, but you may want to use something like
   Textwrap.dedent
   to clear out the leading spaces
   and/or reformat.
"""


>>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
True

>>> " ".join(["space", "string", "joiner"]) == "space string joiner"

>>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
"""multiple
lines""")
True

Concerns

Operator Precedence

Guido indicated 2 that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. (Assuming that str % stays – it may be eliminated in favor of PEP 3101 – Advanced String Formatting. 3 4)

The resolution is to use parentheses to enforce precedence – the same solution that can be used today:

# Clearest, works today, continues to work, optimization is
# already possible.
("abc %s def" + "ghi") % var

# Already works today; precedence makes the optimization more
# difficult to recognize, but does not change the semantics.
"abc" + "def %s ghi" % var

as opposed to:

# Already fails because modulus (%) is higher precedence than
# addition (+)
("abc %s def" + "ghi" % var)

# Works today only because adjacency is higher precedence than
# modulus.  This will no longer be available.
"abc %s" "def" % var

# So the 2-to-3 translator can automatically replace it with the
# (already valid):
("abc %s" + "def") % var

Long Commands

… build up (what I consider to be) readable SQL queries 5:

rows = self.executesql("select cities.city, state, country"
                       "    from cities, venues, events, addresses"
                       "    where cities.city like %s"
                       "      and events.active = 1"
                       "      and venues.address = addresses.id"
                       "      and addresses.city = cities.id"
                       "      and events.venue = venues.id",
                       (city,))

Alternatives again include triple-quoted strings, +, and .join:

query="""select cities.city, state, country
             from cities, venues, events, addresses
             where cities.city like %s
               and events.active = 1"
               and venues.address = addresses.id
               and addresses.city = cities.id
               and events.venue = venues.id"""

query=( "select cities.city, state, country"
      + "    from cities, venues, events, addresses"
      + "    where cities.city like %s"
      + "      and events.active = 1"
      + "      and venues.address = addresses.id"
      + "      and addresses.city = cities.id"
      + "      and events.venue = venues.id"
      )

query="\n".join(["select cities.city, state, country",
                 "    from cities, venues, events, addresses",
                 "    where cities.city like %s",
                 "      and events.active = 1",
                 "      and venues.address = addresses.id",
                 "      and addresses.city = cities.id",
                 "      and events.venue = venues.id"])

# And yes, you *could* inline any of the above querystrings
# the same way the original was inlined.
rows = self.executesql(query, (city,))

Regular Expressions

Complex regular expressions are sometimes stated in terms of several implicitly concatenated strings with each regex component on a different line and followed by a comment. The plus operator can be inserted here but it does make the regex harder to read. One alternative is to use the re.VERBOSE option. Another alternative is to build-up the regex with a series of += lines:

# Existing idiom which relies on implicit concatenation
r = ('a{20}'  # Twenty A's
     'b{5}'   # Followed by Five B's
     )

# Mechanical replacement
r = ('a{20}'  +# Twenty A's
     'b{5}'   # Followed by Five B's
     )

# already works today
r = '''a{20}  # Twenty A's
       b{5}   # Followed by Five B's
    '''                 # Compiled with the re.VERBOSE flag

# already works today
r = 'a{20}'   # Twenty A's
r += 'b{5}'   # Followed by Five B's

Internationalization

Some internationalization tools – notably xgettext – have already been special-cased for implicit concatenation, but not for Python’s explicit concatenation. 6

These tools will fail to extract the (already legal):

_("some string" +
  " and more of it")

but often have a special case for:

_("some string"
  " and more of it")

It should also be possible to just use an overly long line (xgettext limits messages to 2048 characters 8, which is less than Python’s enforced limit) or triple-quoted strings, but these solutions sacrifice some readability in the code:

# Lines over a certain length are unpleasant.
_("some string and more of it")

# Changing whitespace is not ideal.
_("""Some string
     and more of it""")
_("""Some string
and more of it""")
_("Some string \
and more of it")

I do not see a good short-term resolution for this.

Transition

The proposed new constructs are already legal in current Python, and can be used immediately.

The 2 to 3 translator can be made to mechanically change:

"str1" "str2"
("line1"  #comment
 "line2")

into:

("str1" + "str2")
("line1"   +#comments
 "line2")

If users want to use one of the other idioms, they can; as these idioms are all already legal in python 2, the edits can be made to the original source, rather than patching up the translator.