Collections and Loops#
We have already introduced lists and basic loops, including loops that read each line in a file. In this chapter we will revisit those basics and dig a little deeper. We will look at collections including lists, tuples, and dictionaries. We will introduce some additional operations, and especially we will examine ways of looping through collections.
Collection basics#
A collection type (or collection class) is a kind of data that
can contain other data as elements. For example, ["alpha", "beta", "gamma"]
is a list of strings (str
), while [1, 2, 3]
is a list of
integers (int
).
In general a collection type will have:
A syntax for literals, i.e., a way to write down a value and indicate that we intend a collection of a certain kind. For example, if we write
[1, 2, 3]
, Python will interpret it as a list of integers, while if we write(1, 2, 3)
Python will interpret it as a tuple of integers.Operations for building or extending values. For example, if we have a list
m = [1, 2, 3]
, thenm.append(17)
will make the new value ofm
be[1, 2, 3, 17]
.Operations for accessing individual elements. For example, we can index a list. If
m
is[1, 2, 3, 17]
, thenm[0]
is 1 andm[2]
is 3.A way of iterating (looping) through the elements of the collection.
For a complete list and detailed description of these features for all the standard collection types in Python, refer to the Python library documentation. In this chapter we will cover just some basics.
Lists#
A list (type list
) holds a sequence of elements. The elements can
be any other kind of value, including integers and strings but also
other collections including lists.
To write a literal list value, we use square brackets, e.g.,
["a", "b", "c"]
.
We can add an element to the end of a list with the append
method.
Here’s another way to create the value ["a", "b", "c"]
:
m = []
m.append("a")
m.append("b")
m.append("c")
m
['a', 'b', 'c']
The elements of a list have indexes, or positions, starting with zero.
m = ["a", "b", "c"]
print(m[0])
print(m[2])
a
c
Suppose instead we have m = [["a", "b"], ["c", "d"]]
. Now m
is
a list of lists of strings. Now m[1]
is ["c", "d"]
. If we
wanted to access the first element of the second sublist, we can
write m[1][0]
to first index m
to get ["c", "d"]
and then
index ["c", "d"]
to get "c"
.
m = [["a", "b"], ["c", "d"]]
print(m[1][0])
c
We can determine how many elements are in m
with len(m)
. Note
that even if a list contains other lists, len
counts each element
just once (not elements of the sublists).
m = ["a", "b", "c", "d", "e"]
print(len(m))
r = [["a", "b", "c"], ["d", "e", "f"]]
print(len(r))
5
2
There are two main ways to iterate (loop) through the values of a
list. The simplest is the way we will use most often: We can ask a
for
loop to iterate through the elements of the list, like this:
m = ["one", "two", "three"]
for s in m:
print(s)
# or do anything else with s in the body of the loop
one
two
three
Later we will use other approaches, usually because we need to know the position (index) of each element in addition to its value. In that case we might write something like
for s_i in range(len(m)):
s = m[s_i]
print(s_i, s)
0 one
1 two
2 three
In the code snippet above, we have used an arbitrary variable name
s
for elements of m
. In an application, we would try to use a
more meaningful name. We are also free to choose the name of the
index variable. Python does not require that they be related at all.
I have chosen s_i
to suggest that it is the index for s
. You
are not required to use such a convention, but in general it is a
good idea to help a reader of your code see the meanings and
relationships among your variables.
What if we have nested lists, i.e., lists within lists? They
often call for nested loops, loops within loops. Suppose for
example we have m = [["a", "b"], ["c", "d"]]
. We often think of
such nested lists as a grid or matrix in which each sublist is a row:
“a” |
“b” |
“c” |
“d” |
If we wanted to
print all the strings in the sub-lists of m
, we might write
something like
m = [["a", "b"], ["c", "d"]]
for row in m:
for s in row:
print(s)
a
b
c
d
This order of access is called row-major order.
We might wonder whether we
can iterate through m
in column-major order, i.e., down the
first column and then down the second column. We can, but
it’s a little more complex:
for col_index in range(len(m[0])):
for row_index in range(len(m)):
print(row_index, col_index, m[row_index][col_index])
0 0 a
1 0 c
0 1 b
1 1 d
Of course, this approach to looping in column major order works only if the matrix is rectangular, i.e., if every row (sub-list) has the same length. It will also not work for a matrix with zero rows.
Tuples#
The term “tuple” comes from generalizing doubles (pairs), triples, quadruples, quintuples, etc., i.e., sequences of some fixed size. Tuples are very similar to lists, but they are immutable. We can create new tuples based on the contents of other tuples, but we cannot change the value or length of a tuple.
To write a literal tuple value, we separate them by commas, usually in parentheses:
("a", "b", "c")
.
We cannot add elements to the end of a tuple, or anywhere else.
They are immutable.
While the elements of a tuple have indexes like the elements of a list, and can be accessed in the same manner, we more typically access tuple elements by destructuring:
character = ("Wesley", "Dread pirate Roberts", "As you wish")
name, alias, phrase = character
print(name)
print(alias)
print(phrase)
Wesley
Dread pirate Roberts
As you wish
In principle we could loop through tuples in all the ways we loop through lists. In practice that is rare. Because tuples can never change after they have been created, destructuring is usually more appropriate, even for nested tuples:
pdx = ("Portland International", (45.589,-122.596))
name, (lat, lon) = pdx
print(name)
print(lat)
print(lon)
Portland International
45.589
-122.596
Dictionaries#
While lists and tuples represent sequences, dictionaries (type dict
)
represent associations between a set of keys and a set of
values. We can think of them as “lookup tables”. For example,
we might have a dict
that associates postal abbreviations with the
names of U.S. states:
WA |
Washington |
OR |
Oregon |
CA |
California |
AK |
Alaska |
We can write a dict
literal using curly braces and associating
each key to a value with :
. We can then “look up” an association
by treating the key as an index.
state_abbrevs = {
"WA": "Washington",
"OR": "Oregon",
"CA": "California",
"AK": "Arkansas"
}
or_state = state_abbrevs["OR"]
print(or_state)
Oregon
We can add a new (key, value) pair to a dict
using the key as an
index.
state_abbrevs["NV"] = "Nevada"
The keys in a dict
are unique. If we associate a new value with
key, the old (key, value) association is replaced.
state_abbrevs["AK"] = "Alaska"
print(state_abbrevs)
print(state_abbrevs["AK"])
{'WA': 'Washington', 'OR': 'Oregon', 'CA': 'California', 'AK': 'Alaska', 'NV': 'Nevada'}
Alaska
The values in a dict
can be any data type, but the keys must be
hashable, which in practice means that you should use immutable
values as dictionary keys. While we most often use strings as
dictionary keys, integers and tuples are also acceptable. Lists
cannot be dictionary keys.
good_dict = { (1, "Alpha"): "A", (2, "Beta"): "B" }
print(good_dict[(2, "Beta")])
B
bad_dict = { [1, "Alpha"]: "A", [2, "Beta"]: "B" }
print(bad_dict[[2, "Beta"]])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [15], in <cell line: 0>()
----> 1 bad_dict = { [1, "Alpha"]: "A", [2, "Beta"]: "B" }
2 print(bad_dict[[2, "Beta"]])
TypeError: unhashable type: 'list'
The in
operation tests whether a dict
contains a key:
if "TX" in state_abbrevs:
print(state_abbrevs["TX"])
else:
print("TX expansion not found")
TX expansion not found
We cannot directly iterate a dict
, but we can obtain a list of key
values with the keys
method or a list of (key, value) pairs
(tuples) with the items
method of type dict
.
for abbrev in state_abbrevs.keys():
print(abbrev)
WA
OR
CA
AK
NV
Since the elements in the list returned by the items
method are
tuples, it is common to destructure them into separate variables
for each key and value:
for kv_pair in state_abbrevs.items():
abbrev, name = kv_pair
print(abbrev, name)
WA Washington
OR Oregon
CA California
AK Alaska
NV Nevada
The destructuring can be done right in the for
statement, like this:
for abbrev, name in state_abbrevs.items():
print(abbrev, name)
WA Washington
OR Oregon
CA California
AK Alaska
NV Nevada
Loops#
Counting#
A very common programming task is to count elements in a collection.
If we just need to know the size of the collection, in Python we can
use the len
method, e.g.,
print(len(state_abbrevs))
5
Often we want to count only elements that satisfy some condition. For example, suppose for some reason we wanted to determine how many elements of a list of strings are state abbreviations:
abbrevs = ["CO", "OR", "WA", "MD"]
states_count = 0
for ab in abbrevs:
if ab in state_abbrevs:
states_count = states_count + 1
print(states_count)
2
Note the pattern: We initialize the count before the loop, add to it within the loop, and do something with the result after the whole loop is complete.
Counting multiple values#
The counting pattern we have considered so far keeps a count of one
thing in a variable. What if we wanted to keep a count of several
different values? For example, what if we wanted to know that
['dog', 'dog', 'cat', 'dog', 'cat']
is three dogs and two cats?
We might not even know that the list will contain only dogs and cats
… someone might have snuck in a squirrel or a marmoset or some
other random animal. Since we don’t know what values we will
encounter, we can’t create a count variable for each one. What we
can do, however, is use a dict
to keep a collection of count
variables.
animals = ['dog', 'dog', 'cat', 'orca', 'dog', 'cat']
counts = { }
for animal in animals:
if animal in counts:
counts[animal] += 1
else:
# First time we've encountered this one!
counts[animal] = 1
print(counts)
{'dog': 3, 'cat': 2, 'orca': 1}
Summing#
To count items, we always add 1 to the count for each item that
satisfies the condition (e.g., items that appear as keys in
state_abbrevs
). Summing values is almost the same, except
instead of adding 1 for each item, we add the relevant value.
populations = {
"Portland": 641_162,
"Salem": 177_723,
"Eugene": 175_096,
"Gresham": 113_103,
"Hillsboro": 106_633,
"Bend": 102_059,
"Beaverton": 98_216,
"Medford": 86_367,
"Springfield": 62_256,
"Corvallis": 59_864,
"Albany": 56_828,
"Tigard": 55_767,
"Aloha": 53_724
}
itinerary = ["Eugene", "Corvallis", "Albany", "Salem", "Hillsboro"]
pop_sum = 0
for town in itinerary:
pop_sum += populations[town]
print(pop_sum)
576144
For either counting or summing, it is idiomatic to use the +=
operation to clearly communicate that the purpose of the
incrementing statement.
Scanning#
Another common task is to determine whether some or all elements of a collection satisfy some condition. For example, if we wanted to determine whether all the towns on an itinerary were included in a table, we might write:
all_present = True
for town in itinerary:
if town not in populations:
all_present = False
break
print(all_present)
True
The code above illustrates the general pattern to determine a for all property:
Initially we assume the condition is true
If any element does not satisfy the condition, we conclude the property is false. We do not need to look further (so we can
break
from the loop).The final answer is known after the loop body.
If we write a for all scan as a function, the logic is similar,
but we don’t need an explicit bool
variable:
def all_in_table(li: list, table: dict) -> bool:
"""True if all elements of li are keys in table"""
for elem in li:
if elem not in table:
return False
return True
print(all_in_table(itinerary, populations))
True
In this version of the for all scan, an early return
takes the
place of the break
from the loop, and the final return True
takes the place of initializing a bool
variable to True
.
We can also scan to determine whether any elements in a collection satisfy a condition. In terms of mathematical logic, we would call this a there exists scan. Suppose we want to know whether at least one of the towns on our itinerary are among the most populous cities in Oregon, which are listed in our population table.
reach_big_city = False
for town in itinerary:
if town in populations:
reach_big_city = True
break
print(reach_big_city)
True
In a there exists scan, we can break from the loop as soon as we find any element that satisfies the criterion, but we must finish the whole loop to conclude that there are no satisfying elements.
Like the for all scan, a there exists scan may be implemented as a function:
def exists_in_table(li: list, table: dict) -> bool:
"""True if any elements of li are keys in table"""
for elem in li:
if elem in table:
return True
return False
print(exists_in_table(itinerary, populations))
True
Once again, initialization of the boolean variable and the break
from the loop are replaced with return
statements in appropriate
places.
Other iterables#
We have already seen at least one other type of Python data type with behavior similar to lists: After opening a file, we can iterate (loop) through it line by line. You can also loop through the characters in a string:
s = "What?"
for ch in s:
print(ch)
W
h
a
t
?
There are more. In Python, “things you can loop through” are called
iterables. When we loop through indexes for a list l
using
range(len(l))
, we are actually iterating through elements of a range
object:
for e in range(3):
print(e)
0
1
2
It is even possible to create new kinds of collection that you can loop through, but we won’t do that in this course.
Project#
The project for this week
asks you to produce a summary report from a class roster. It uses
lists, dictionaries, and tuples. The pattern above, using a dict
to
keep several counts of items in a list, is the key to counting the
number of students in each major.