The Python Workshop
上QQ阅读APP看书,第一时间看更新

Python Algorithms

An algorithm is a series of instructions that can be executed to perform a certain task or computation. A recipe for a cake is an example of an algorithm. For example, preheat the oven, beat 125 g of sugar and 100 g of butter, and then add eggs and other ingredients. Similarly, simple computations in mathematics are algorithms. For example, when computing the perimeter of a circle, you multiply the radius by . It's a short algorithm, but an algorithm, nonetheless.

Algorithms are often initially defined in pseudocode, which is a way of writing down the steps a computer program will make without coding in any specific language. A reader should not need a technical background in order to read the logic expressed in pseudocode. For example, if you had a list of positive numbers and wanted to find the maximum number of positive numbers in that list, an algorithm expressed in pseudocode could be as follows:

  1. Set the maximum variable to 0.
  2. For each number in our list, check whether the number is bigger than the maximum.

    Set the maximum variable so that it is equal to that number.

  3. maximum is now equal to the largest number in the list.

Pseudocode is useful because it allows us to show the logic of our code in a more universally accessible format than writing in a specific programming language. Programmers will often map out their thinking in pseudocode to explore the logic of their planned approach before writing the code itself.

In Exercise 38, The Maximum Number, you will be using this pseudocode to find the maximum number from a list of numbers.

Exercise 38: The Maximum Number

In this exercise, you will implement the pseudocode to find the maximum from a list of positive numbers:

  1. Create a list of numbers:

    l = [4, 2, 7, 3]

  2. Set the maximum variable equal to 0:

    maximum = 0

  3. Look through each number, and compare it to maximum:

    for number in l:

        if number > maximum:

            maximum = number

  4. Check the result:

    print(maximum)

    You should get the following output:

    7

In this exercise, you successfully implemented the pseudocode given and found the maximum in a list of numbers.

Time Complexity

So far, in this book, we have become accustomed to our programs being executed at near-instantaneous speed. Computers are very fast, and the difference between performing 10 iterations in a loop and 1,000 iterations may seem immaterial to us. However, algorithms can quickly become inefficient as the problem grows. In measuring complexity, you are interested in knowing the time it takes to execute the algorithm changes as the size of the problem changes. If the problem is 10 times as large, does the algorithm take 10 times as long to execute, 100 times as long, or 1010 times as long? This relationship between the size of the problem and the steps taken is called the time complexity of an algorithm.

Of course, you could simply time our algorithm on problems of different sizes and observe the relationship on a graph. This technique is often useful when the algorithm is complex, and the theoretical relationship between size and time isn't computable. However, this isn't entirely satisfactory, as the actual time taken is also conditional on factors such as the memory that is available, the processor, the disk speed, and other processes consuming resources on the machine. It will only ever be an empirical approximation and may vary depending on the computer.

Instead, you simply count the number of operations required to execute the algorithm. The result of this counting is expressed with big-O notation. For example, O(n) means that, for a problem of size n, the number of steps taken is proportional to n. This means that the actual number of steps required can be expressed as , where alpha and beta are constants. Another way of thinking about this is that the steps required to execute the algorithm grow linearly with the problem size:

Figure 3.7: Visual representation of linear time complexity

Any such problem where the complexity can be expressed as a linear function, , has a time complexity of O(n).

Other common time complexities include:

  • O(1): Constant time. Here, the time taken is always the same, regardless of the problem size; for example, accessing an element of an array at a given index.
  • O(n^2): Quadratic time. Here, the time taken is proportional to the square of the problem size; for example, the bubble sort algorithm (this is covered in Exercise 39, Using Bubble Sort in Python).
  • O(log n): Logarithmic time. Here, the time taken is proportional to the natural logarithm of the problem size; for example, the binary search algorithm (this is covered in Exercise 41, Binary Search in Python).

Time Complexity for the Maximum Number Algorithm

In the previous exercise, you computed the maximum of a list of positive numbers. Here, you express the complexity of the algorithm using the big-O notation:

  1. Our program starts by setting the maximum = 0 variable. This is our first step: total_steps = 1.
  2. For a list of size n, you are going to loop through each number and perform the following operations:

    (a) Check whether it's greater than the maximum variable.

    (b) If so, assign maximum to the number.

  3. Sometimes, there will be one step executed for a number and, sometimes, there will be two steps (if that number happens to be the new maximum). You don't really care what this number is, so, let's take its average, which you'll call . That is, for each number, there will be an average of steps executed, where is a number between 1 and 2.
  4. So, total_steps = . This is a linear function, so the time complexity is O(n).

Sorting Algorithms

The most commonly discussed family of algorithms in computer science courses are sorting algorithms. Sorting algorithms come to your aid when, say, you have a list of values, and you want to sort these into an ordered list. This problem is ever-present in our data-driven world; consider the following scenarios:

  • You have a database of contacts and want to see them listed alphabetically.
  • You want to retrieve the five best test results from a classroom of students.
  • You have a list of insurance policies and want to see which ones have the most recent claims.

The output of any sorting algorithm must satisfy two conditions:

  1. It must be in non-decreasing order. That is, each element must be equal to or greater than the element that came before it.
  2. It must be a permutation of the input. That is, the input elements must simply be rearranged and not altered.

Here is a simple example of what we want a sorting algorithm to accomplish:

Figure 3.8: A simple problem for a sorting algorithm to solve

One such algorithm for performing this operation is called bubble sort. It is explained as follows:

  1. Start with the first two elements of this list. If the first is larger than the second, then switch the positions of the numbers. In this case, you leave them as is, as 5 < 8:

    Figure 3.9: Step 1 for the bubble sort algorithm

  2. Move onto the next two elements. Here, you switch the positions of 8 and 1:

    Figure 3.10: Step 2 for the bubble sort algorithm

  3. For the next pair, again, switch the positions, as 8 > 3:

    Figure 3.11: Step 3 for the bubble sort algorithm

  4. For the final pair, switch the positions again, as 8 > 2:

    Figure 3.12: Step 4 for the bubble sort algorithm

  5. Go back to the start of the list and repeat the preceding process.
  6. Continue looping through the list until no further swaps are required.

Exercise 39: Using Bubble Sort in Python

In this exercise, you will implement the bubble sort algorithm in Python with a list of numbers:

  1. Start with a list of numbers:

    l = [5, 8, 1, 3, 2]

  2. Create an indicator that will tell us when you can stop looping through the array:

    still_swapping = True

  3. Look through each number, and compare it to maximum:

    while still_swapping:

        still_swapping = False

        for i in range(len(l) - 1):

            if l[i] > l[i+1]:

                l[i], l[i+1] = l[i+1], l[i]

                still_swapping = True

       

  4. Check the result:

    l

    You should get the following output:

    [1, 2, 3, 5, 8]

Bubble sort is a very simple but inefficient sorting algorithm. Its time complexity is O(n^2), meaning that the number of steps required is proportional to the square of the size of the list.

Searching Algorithms

Another important family of algorithms is the searching algorithm. In a world where you are producing an exponentially increasing amount of data, these algorithms have a huge impact on our day-to-day lives. Simply considering the size of Google should give you an appreciation of the importance (and complexity) of these algorithms. Of course, you encounter the need for these algorithms just about every time you pick up a phone or open a laptop:

  • Searching your contacts list to send a message
  • Searching your computer for a specific application
  • Searching for an email containing a flight itinerary

With any of these examples, you can apply the simplest form of search, that is, a linear search. This will involve simply looping through all possible results and checking whether they match the search criteria. For example, if you were searching your contacts list, you would look through each contact one by one, and check whether that contact met the search criteria. If so, return the position of the result. This is a simple but inefficient algorithm, with time complexity of O(n).

Exercise 40: Linear Search in Python

In this exercise, you will implement the linear search algorithm in Python using a list of numbers:

  1. Start with a list of numbers:

    l = [5, 8, 1, 3, 2]

  2. Specify a value to search_for:

    search_for = 8

  3. Create a result variable that has a default value of -1. If the search is unsuccessful, this value will remain -1 after the algorithm is executed:

    result = -1

  4. Loop through the list. If the value equals the search value, set the result, and exit the loop:

    for i in range(len(l)):

        if search_for == l[i]:

            result = i

            break

  5. Check the result:

    print(result)

    You should get the following output:

    1

    Note

    This means that the search found the required value at position 1 in the list (which is the second item in the list, as indices start from 0 in Python).

Another common sorting algorithm is called a binary search. The binary search algorithm takes a sorted array and finds the position of the target value. Suppose that you were trying to find the position of the number 11 in the following list:

Figure 3.13: A simple problem for a search algorithm to solve

The binary search algorithm will be explained as follows:

  1. Take the midpoint of the list. If this value is less than the target value, discard the left half of the list, and vice versa. In this case, our target value of 11 is greater than 8, so you know that you can restrict our search to the right side of the list (since you know the array is sorted):

    Figure 3.14: Splitting the list at the midpoint, 8

    Note

    If there is an even number of items on the list, simply take one of the two middle numbers, it doesn't matter which.

  2. You repeat this process with the right side of the list, picking the midpoint of the remaining values. Since the target value, 11 is less than the midpoint 12, and you discard the right side of our sublist:

    Figure 3.15: Splitting the list in the midpoint of the remaining list

  3. This leaves you with the value that you were searching for:

Figure 3.16: Reaching the final result

Exercise 41: Binary Search in Python

In this exercise, you will implement the binary search algorithm in Python:

  1. Start with a list of numbers:

    l = [2, 3, 5, 8, 11, 12, 18]

  2. Specify the value to search_for:

    search_for = 11

  3. Create two variables that will represent the start and end locations of the sublist you are interested in. Initially, it will represent the start and end indices for the entire list:

    slice_start = 0

    slice_end = len(l) - 1

  4. Add a variable to indicate whether the search was successful:

    found = False

  5. Find the midpoint of the list, and check whether the value is greater or less than the search term. Depending on the outcome of the comparison, either finish the search or update the locations for the start/end of the sublist:

    while slice_start <= slice_end and not found:

        location = (slice_start + slice_end) // 2

        if l[location] == search_for:

            found = True

        else:

            if search_for < l[location]:

                slice_end = location - 1

            else:

                slice_start = location + 1

  6. Check the results:

    print(found)

    print(location)

    You should get the following output:

    True

    4

In this exercise, you successfully implemented the binary search algorithm on a list of numbers.