Python Essentials
上QQ阅读APP看书,第一时间看更新

Python language concepts

We'll introduce a few central concepts of the Python language before looking at more complex examples in later chapters. The first of the central concepts is that everything in Python is an object. Several popular languages have primitive types which escape the object-oriented nature of the language. Python doesn't have this feature. Even simple integers are objects, with defined methods.

Because everything is an object, we're assured of consistent behavior with no special cases. In some languages, the == operator works in one way for primitive types and in another way for objects. Python lacks this pergent behavior. All built-in classes implement the == operator consistently; unless we make specific (and pathological) implementation choices, our own classes will also behave consistently.

This consistency is particularly pleasant when working with strings. In Python, we always compare strings for equality using something like txt.lower() = "hours". This will make the expected character-by-character comparison between the value of txt.lower() and the literal "hours".

Less commonly, we can see if two variables are references to the same underlying object using the is comparison operator. This is generally used to compare a variable with the None object. We use is None because the None object is a proper singleton; there can be only one instance of None. We'll look at this again in Chapter 5, Logic, Comparisons, and Conditions.

Object types versus variable declarations

In Python, we specify the processing generically with respect to type. We may write a sequence of statements with the implicit understanding that floating-point values should be used. We can formalize this to an extent using an explicit float() conversion function.

In some languages, each variable has a statically defined type. Only objects of the named type can be assigned to the variable.

In contrast to languages with statically defined variables, a Python variable can be understood as a name which is attached to an object. We can attach a name to any object of any class. We don't statically declare a narrow range of allowed types for a variable.

Python allows us to assign multiple names to the same object by assigning the object to several variables. For example, when we evaluate a function, the function parameter variable names are assigned to the argument objects. (We'll look at this in more depth in Chapter 7, Basic Function Definitions.) This means that each object may have two variables referring to it: one parameter variable inside the function and another variable outside the function.

We can use the internal id() function to see if two variables refer to the same underlying object:

>>> a = "string"
>>> b = a
>>> id(a)
4301974472
>>> id(b)
4301974472

From this, we can see that Python variables a and b have references to the underlying object, not copies of the object.

In the rare cases that object copying is necessary, we must do it explicitly. Details vary, based on the general kind of class. For example, sequences are trivially cloned by creating a slice that includes the entire sequence. Some classes offer a copy() method. Objects can also be cloned via functions in the copy library.

The lack of a fixed type declaration for a variable has several consequences:

  • It's trivial to introduce a variable to decompose a complex expression. Here's a complex expression:
    a = some_function( some_complex_function( another_function( b ) ) )
  • We can trivially rewrite this by pulling out subexpressions and assigning them to variables:
    af = another_function(b)
    scf = some_complex_function(af)
    a = some_function(scf)

    We've extracted each subexpression and assigned them to separate variables. We never need to know what the intermediate result types are.

  • All algorithms are written generically. When we run a script, we apply our generic Python code to concrete objects. Our canonical example of this binding is based on the numeric tower. We can apply the same expression, 32+9*c/5, to objects of the classes complex, float, int, Decimal, and Fraction. All of these classes provide the necessary implementations of the various operators. However, a string object won't implement all of the arithmetic operations required, and won't work. Similarly, we can execute statements like head, *tail = sequence for a wide variety of sequence-like classes, including list, str, bytes, and tuple. However, if we assign a numeric value to the variable named sequence, the statement won't work.

Avoiding the declaration of variables with static types is a great simplification. We can introduce variables as needed. We can write clear, simple, generic software and leave it to the Python runtime processing to determine if the runtime objects have the required implementations for operators and methods.

Avoiding confusion when naming variables

Without variable declarations, there's a small possibility of creating programs which are confusing if we use vague, generic variables. A variable with a vague name like list_of_items might get used more than once in a longish sequence of statements. Worse, of course, are variables with names like t or temp.

Tip

Name variables as specifically as possible. Avoid vague, generic names.

The other aspect of overusing variable names is the idea of a "longish" sequence of statements. If the body of a function is so long that generically-named variables could get reused accidentally, the size of the function has become a problem. No stretch of Python code should be so long that the variables used within it are confusing.

Tip

Keep sequences of code short and focused. Avoid long sequences of code where variables might get reused incorrectly.

It's import to name variables simply and clearly. In Python, the use of Hungarian notation to decorate a variable name with type information is considered deplorable. The original concept of Hungarian notation was to place a few characters as a prefix on a variable to indicate the type. In Python, we do not name a variable lst_str_names using a prefix to indicate that the variable refers to a list of string values.

Because Python code is written generically, a well-written function can apply to many different data types. If we try to encode data type information in variable names, we may actually be sowing confusion: the algorithm may work for types not explicitly stated in the variable name.

In some situations, we need to distinguish between a collection of items and an inpidual item. We might have a name_list and an inpidual name. Or we might have a name_iter, when working with generator functions, and an inpidual name. A small, clear naming convention like this is better than elaborately misleading Hungarian notation.

Tip

Avoid complex Hungarian notation in variable names.

In a more complex program, we might have a dictionary that maps integer keys to sets associated with those keys; each set may have a collection of inpidual strings. It's difficult to summarize this with a Hungarian prefix or suffix. Would we want to try and call this map_int_set_str_something?

Looking ahead to Chapter 7, Basic Function Definitions and Chapter 11, Class Definitions, we'll often use docstring comments in functions, classes, and modules to capture the details of what kind of structure is appropriate for a function. We may even include test cases in the docstring comments; test cases are perhaps the clearest and most precise way to describe data.

Tip

Write docstring comments in every context that allows them: function, class, module, and package.

One consequence of Python's use of variables is that we rely on unit test cases to ensure that results are of the expected types as well as being correct. Programmers who work in languages with statically-typed variables are very aware that unit test cases are essential for correctness, even when a compiler does type checking of all variable declarations. In Python, the test cases are just as important as in languages that have static type checking. If it is necessary to clarify the intent of a function or class, we can include type checking in the test cases.

Tip

Write unit tests; use the unittest module, the doctest module, or both.

Garbage collection via reference counting

We've seen how expressions create new objects. Even something as simple as 2**2024 creates a new integer object. What happens to these objects? When will we run out of memory?

Python uses reference counting to determine how many times an object is being used when we do something like this:

>>> 2**2024
192624...497216

The resulting object is a very large integer; it is assigned to the variable _ automatically. The object, shown as 192624...497216, has a single reference; this keeps it alive in memory.

When we do this, next:

>>> 2**2025
385248...994432

We get a new object, and it is assigned to the variable _. The large integer value formerly assigned to _ has no more references. Since it's no longer being used, it's garbage, and the memory it occupied can be reused.

Each time we assign an object to a variable, the reference count goes up by one. Each time the variable's value is reassigned, the previous object that is no longer in use has its reference count decreased by one.

When a variable is no longer required, the variable is removed, and the objects referred to by the variable also have their reference counts reduced by one.

Variables belong to namespaces. Most of our early examples used the global namespace. In Chapter 7, Basic Function Definitions, we'll see local namespaces. To summarize: when a namespace is removed, all of the variables in that namespace are removed, and all of the object references are decremented by one.

Tip

When the number of references to an object reaches zero, the object is no longer needed. The memory occupied by that object can be reclaimed.

We can easily create two complex objects which refer to each other. In the presence of these kinds of circular references, of course, the counts can never reach zero. The objects may never get removed from memory. We can use the gc module to discover more about this.

In the case where we must have objects with mutual references, we need to leverage the weakref module. This module provides references among objects that do not interfere with reference counting, allowing a large data structure of multiple objects to gracefully vanish from memory when no longer in use.

The little-used del statement

We can remove variables manually with the del statement. Here's an example:

>>> a = 2**2024
>>> del a

We've created an integer object, and assigned it to the variable a. When we remove the variable, this will reduce the reference count on the integer object. The memory occupied by the big integer is now eligible to be reclaimed.

This kind of thing is done very rarely. Python's ordinary reference counting does almost everything we need. It's generally best not to waste brain calories tying to micro-manage memory allocation.