Collections and Loops#

We have already introduced lists and basic loops, including loops that read each line in a file. In this chapter we will revisit those basics and dig a little deeper. We will look at collections including lists, tuples, and dictionaries. We will introduce some additional operations, and especially we will examine ways of looping through collections.

Collection basics#

A collection type (or collection class) is a kind of data that can contain other data as elements. For example, ["alpha", "beta",  "gamma"] is a list of strings (str), while [1, 2, 3] is a list of integers (int).

In general a collection type will have:

  • A syntax for literals, i.e., a way to write down a value and indicate that we intend a collection of a certain kind. For example, if we write [1, 2, 3], Python will interpret it as a list of integers, while if we write (1, 2, 3) Python will interpret it as a tuple of integers.

  • Operations for building or extending values. For example, if we have a list m = [1, 2, 3], then m.append(17) will make the new value of m be [1, 2, 3, 17].

  • Operations for accessing individual elements. For example, we can index a list. If m is [1, 2, 3, 17], then m[0] is 1 and m[2] is 3.

  • A way of iterating (looping) through the elements of the collection.

For a complete list and detailed description of these features for all the standard collection types in Python, refer to the Python library documentation. In this chapter we will cover just some basics.

Lists#

A list (type list) holds a sequence of elements. The elements can be any other kind of value, including integers and strings but also other collections including lists.

To write a literal list value, we use square brackets, e.g., ["a", "b", "c"].

We can add an element to the end of a list with the append method. Here’s another way to create the value ["a", "b", "c"]:

m = []
m.append("a")
m.append("b")
m.append("c")
m
['a', 'b', 'c']

The elements of a list have indexes, or positions, starting with zero.

m = ["a", "b", "c"]
print(m[0])
print(m[2])
a
c

Suppose instead we have m = [["a", "b"], ["c", "d"]]. Now m is a list of lists of strings. Now m[1] is ["c", "d"]. If we wanted to access the first element of the second sublist, we can write m[1][0] to first index m to get ["c", "d"] and then index ["c", "d"] to get "c".

m = [["a", "b"], ["c", "d"]]
print(m[1][0])
c

We can determine how many elements are in m with len(m). Note that even if a list contains other lists, len counts each element just once (not elements of the sublists).

m = ["a", "b", "c", "d", "e"]
print(len(m))
r = [["a", "b", "c"], ["d", "e", "f"]]
print(len(r))
5
2

There are two main ways to iterate (loop) through the values of a list. The simplest is the way we will use most often: We can ask a for loop to iterate through the elements of the list, like this:

m = ["one", "two", "three"]
for s in m: 
    print(s)
    # or do anything else with s in the body of the loop
one
two
three

Later we will use other approaches, usually because we need to know the position (index) of each element in addition to its value. In that case we might write something like

for s_i in range(len(m)):
    s = m[s_i]
    print(s_i, s)
0 one
1 two
2 three

In the code snippet above, we have used an arbitrary variable name s for elements of m. In an application, we would try to use a more meaningful name. We are also free to choose the name of the index variable. Python does not require that they be related at all. I have chosen s_i to suggest that it is the index for s. You are not required to use such a convention, but in general it is a good idea to help a reader of your code see the meanings and relationships among your variables.

What if we have nested lists, i.e., lists within lists? They often call for nested loops, loops within loops. Suppose for example we have m = [["a", "b"], ["c", "d"]]. We often think of such nested lists as a grid or matrix in which each sublist is a row:

“a”

“b”

“c”

“d”

If we wanted to print all the strings in the sub-lists of m, we might write something like

m = [["a", "b"], ["c", "d"]]
for row in m:
    for s in row:
        print(s)
a
b
c
d

This order of access is called row-major order. We might wonder whether we can iterate through m in column-major order, i.e., down the first column and then down the second column. We can, but it’s a little more complex:

for col_index in range(len(m[0])):
    for row_index in range(len(m)):
        print(row_index, col_index, m[row_index][col_index])
0 0 a
1 0 c
0 1 b
1 1 d

Of course, this approach to looping in column major order works only if the matrix is rectangular, i.e., if every row (sub-list) has the same length. It will also not work for a matrix with zero rows.

Tuples#

The term “tuple” comes from generalizing doubles (pairs), triples, quadruples, quintuples, etc., i.e., sequences of some fixed size. Tuples are very similar to lists, but they are immutable. We can create new tuples based on the contents of other tuples, but we cannot change the value or length of a tuple.

To write a literal tuple value, we separate them by commas, usually in parentheses:

("a", "b", "c").

We cannot add elements to the end of a tuple, or anywhere else.
They are immutable.

While the elements of a tuple have indexes like the elements of a list, and can be accessed in the same manner, we more typically access tuple elements by destructuring:

character = ("Wesley", "Dread pirate Roberts", "As you wish")
name, alias, phrase = character
print(name)
print(alias)
print(phrase)
Wesley
Dread pirate Roberts
As you wish

In principle we could loop through tuples in all the ways we loop through lists. In practice that is rare. Because tuples can never change after they have been created, destructuring is usually more appropriate, even for nested tuples:

pdx = ("Portland International", (45.589,-122.596))
name, (lat, lon) = pdx
print(name)
print(lat)
print(lon)
Portland International
45.589
-122.596

Dictionaries#

While lists and tuples represent sequences, dictionaries (type dict) represent associations between a set of keys and a set of values. We can think of them as “lookup tables”. For example, we might have a dict that associates postal abbreviations with the names of U.S. states:

WA

Washington

OR

Oregon

CA

California

AK

Alaska

We can write a dict literal using curly braces and associating each key to a value with :. We can then “look up” an association by treating the key as an index.

state_abbrevs = {
    "WA": "Washington",
    "OR": "Oregon", 
    "CA": "California", 
    "AK": "Arkansas"
    }
    
or_state = state_abbrevs["OR"]
print(or_state)
Oregon

We can add a new (key, value) pair to a dict using the key as an index.

state_abbrevs["NV"] = "Nevada"

The keys in a dict are unique. If we associate a new value with key, the old (key, value) association is replaced.

state_abbrevs["AK"] = "Alaska"
print(state_abbrevs)
print(state_abbrevs["AK"])
{'WA': 'Washington', 'OR': 'Oregon', 'CA': 'California', 'AK': 'Alaska', 'NV': 'Nevada'}
Alaska

The values in a dict can be any data type, but the keys must be hashable, which in practice means that you should use immutable values as dictionary keys. While we most often use strings as dictionary keys, integers and tuples are also acceptable. Lists cannot be dictionary keys.

good_dict = { (1, "Alpha"): "A", (2, "Beta"): "B" }
print(good_dict[(2, "Beta")])
B
bad_dict = { [1, "Alpha"]: "A", [2, "Beta"]: "B" }
print(bad_dict[[2, "Beta"]])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 bad_dict = { [1, "Alpha"]: "A", [2, "Beta"]: "B" }
      2 print(bad_dict[[2, "Beta"]])

TypeError: unhashable type: 'list'

The in operation tests whether a dict contains a key:

if "TX" in state_abbrevs:
    print(state_abbrevs["TX"])
else: 
    print("TX expansion not found")
TX expansion not found

We cannot directly iterate a dict, but we can obtain a list of key values with the keys method or a list of (key, value) pairs (tuples) with the items method of type dict.

for abbrev in state_abbrevs.keys():
    print(abbrev)
WA
OR
CA
AK
NV

Since the elements in the list returned by the items method are tuples, it is common to destructure them into separate variables for each key and value:

for kv_pair in state_abbrevs.items(): 
    abbrev, name = kv_pair
    print(abbrev, name)
WA Washington
OR Oregon
CA California
AK Alaska
NV Nevada

The destructuring can be done right in the for statement, like this:

for abbrev, name in state_abbrevs.items():
    print(abbrev, name)
WA Washington
OR Oregon
CA California
AK Alaska
NV Nevada

Loops#

Counting#

A very common programming task is to count elements in a collection. If we just need to know the size of the collection, in Python we can use the len method, e.g.,

print(len(state_abbrevs))
5

Often we want to count only elements that satisfy some condition. For example, suppose for some reason we wanted to determine how many elements of a list of strings are state abbreviations:

abbrevs = ["CO", "OR", "WA", "MD"]
states_count = 0
for ab in abbrevs:
    if ab in state_abbrevs:
        states_count = states_count + 1
print(states_count) 
2

Note the pattern: We initialize the count before the loop, add to it within the loop, and do something with the result after the whole loop is complete.

Counting multiple values#

The counting pattern we have considered so far keeps a count of one thing in a variable. What if we wanted to keep a count of several different values? For example, what if we wanted to know that ['dog', 'dog', 'cat', 'dog', 'cat'] is three dogs and two cats? We might not even know that the list will contain only dogs and cats … someone might have snuck in a squirrel or a marmoset or some other random animal. Since we don’t know what values we will encounter, we can’t create a count variable for each one. What we can do, however, is use a dict to keep a collection of count variables.

animals = ['dog', 'dog', 'cat', 'orca', 'dog', 'cat']
counts = { }
for animal in animals: 
    if animal in counts:
        counts[animal] += 1
    else:
        # First time we've encountered this one! 
        counts[animal] = 1
print(counts)
{'dog': 3, 'cat': 2, 'orca': 1}

Summing#

To count items, we always add 1 to the count for each item that satisfies the condition (e.g., items that appear as keys in state_abbrevs). Summing values is almost the same, except instead of adding 1 for each item, we add the relevant value.

populations = {
    "Portland":   641_162,
    "Salem":      177_723,
    "Eugene":	  175_096,
    "Gresham":	  113_103,
    "Hillsboro":  106_633,
    "Bend":       102_059,
    "Beaverton":   98_216,
    "Medford":	   86_367,
    "Springfield": 62_256,
    "Corvallis":   59_864,
    "Albany":	   56_828,
    "Tigard":	   55_767,
    "Aloha":	   53_724
}
itinerary = ["Eugene", "Corvallis", "Albany", "Salem", "Hillsboro"]

pop_sum = 0
for town in itinerary:
    pop_sum += populations[town]
print(pop_sum)
576144

For either counting or summing, it is idiomatic to use the += operation to clearly communicate that the purpose of the incrementing statement.

Scanning#

Another common task is to determine whether some or all elements of a collection satisfy some condition. For example, if we wanted to determine whether all the towns on an itinerary were included in a table, we might write:

all_present = True
for town in itinerary: 
    if town not in populations: 
        all_present = False
        break
print(all_present)
True

The code above illustrates the general pattern to determine a for all property:

  • Initially we assume the condition is true

  • If any element does not satisfy the condition, we conclude the property is false. We do not need to look further (so we can break from the loop).

  • The final answer is known after the loop body.

If we write a for all scan as a function, the logic is similar, but we don’t need an explicit bool variable:

def all_in_table(li: list, table: dict) -> bool: 
    """True if all elements of li are keys in table"""
    for elem in li:
        if elem not in table: 
            return False
    return True
    
print(all_in_table(itinerary, populations))
True

In this version of the for all scan, an early return takes the place of the break from the loop, and the final return True takes the place of initializing a bool variable to True.

We can also scan to determine whether any elements in a collection satisfy a condition. In terms of mathematical logic, we would call this a there exists scan. Suppose we want to know whether at least one of the towns on our itinerary are among the most populous cities in Oregon, which are listed in our population table.

reach_big_city = False
for town in itinerary: 
    if town in populations: 
        reach_big_city = True
        break
print(reach_big_city)
True

In a there exists scan, we can break from the loop as soon as we find any element that satisfies the criterion, but we must finish the whole loop to conclude that there are no satisfying elements.

Like the for all scan, a there exists scan may be implemented as a function:

def exists_in_table(li: list, table: dict) -> bool: 
    """True if any elements of li are keys in table"""
    for elem in li:
        if elem in table: 
            return True
    return False
   
print(exists_in_table(itinerary, populations))
True

Once again, initialization of the boolean variable and the break from the loop are replaced with return statements in appropriate places.

Other iterables#

We have already seen at least one other type of Python data type with behavior similar to lists: After opening a file, we can iterate (loop) through it line by line. You can also loop through the characters in a string:

s = "What?"
for ch in s:
    print(ch)
W
h
a
t
?

There are more. In Python, “things you can loop through” are called iterables. When we loop through indexes for a list l using range(len(l)), we are actually iterating through elements of a range object:

for e in range(3):
    print(e)
0
1
2

It is even possible to create new kinds of collection that you can loop through, but we won’t do that in this course.

Project#

The project for this week asks you to produce a summary report from a class roster. It uses lists, dictionaries, and tuples. The pattern above, using a dict to keep several counts of items in a list, is the key to counting the number of students in each major.