Data Structures in Python Part 2: Named Tuples

Python

Wed Nov 29 2023

Not long ago i wrote about common data structures in python. One of which is one i don't use very often but wanted to explore more, the Named Tuple. Its like a normal tuple where the data is immutable, but they allow you to access elements with names instead of just indices, almost making it usable as a simple object definition. This can make your code more readable and self-documenting. They are part of the collections module and are useful in many scenarios where a regular tuple is used, but you want the data to be self-explanatory. Lets dive in and learn more about named tuples and their usecases.

Use Cases for Named Tuples

Representing Simple Objects: When you want an immutable object but don’t need the complexity of a full class.
Data Parsing: Great for handling CSV, SQL, or other data parsing tasks where each row of data can be represented as a named tuple for clear field access.
Returning Multiple Values: Useful for functions that need to return more than one value, and you want the return value to be self-descriptive.
As a Replacement for Dictionaries: When your data structure doesn’t need to change (since named tuples are immutable) and you want to access elements by an attribute name instead of a key.

Examples

Basic Example:

from collections import namedtuple

# Creating a named tuple class
Person = namedtuple('Person', 'name age gender')

# Creating an instance of the named tuple
p = Person(name="Evan", age=30, gender="Male")

# Accessing the fields
print(p.name)  # Output: Evan
print(p.age)   # Output: 30

python

Using with CSV Data: Imagine you have a CSV file with employee data. Each row has a name, ID, and role.

import csv
from collections import namedtuple

Employee = namedtuple('Employee', 'name id role')

with open('employees.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        emp = Employee(*row)
        print(emp.name, emp.role)

python

As Return Values from Functions: A function that calculates both area and perimeter of a rectangle.

def rectangle_stats(length, width):
    RectangleStats = namedtuple('RectangleStats', 'area perimeter')
    return RectangleStats(length * width, 2 * (length + width))

stats = rectangle_stats(10, 5)
print(stats.area)      # Output: 50
print(stats.perimeter) # Output: 30

python

Syntax

Here's a bit more detail on the syntax:

from collections import namedtuple

# Creating a named tuple
NamedTupleName = namedtuple('NamedTupleName', 'field1 field2 field3')

python

In this syntax:

'NamedTupleName' is the name of the new named tuple class you are creating.
'field1 field2 field3' is a space-delimited string of field names that the named tuple will have. These fields will be accessible as attributes of the named tuple.

You can also pass the field names as a list of strings, which is useful if the field names are dynamically generated:

fields = ['field1', 'field2', 'field3']
NamedTupleName = namedtuple('NamedTupleName', fields)

python

Once the named tuple is defined, you can create instances of it like this:

nt = NamedTupleName(field1='value1', field2='value2', field3='value3')

python

And access its fields using dot notation:

print(nt.field1)  # Outputs 'value1'

python

Naming the NamedTuple

In the namedtuple creation syntax in Python, there are indeed two places where names appear, and understanding their roles is important:

Variable Name (NamedTupleName in the Example):
- This is the variable to which the new namedtuple class is assigned.
- In typical Python usage, this variable becomes the namedtuple "type" you use to create new instances.
- You use this variable name when creating instances of your namedtuple.
Type Name (First Argument in namedtuple() Function):
- This is a string that specifies the name of the new namedtuple type itself.
- It's used internally by Python to create a new class type.
- This name shows up in the __repr__ output of the namedtuple and is also used for type identification.

In practice, these two names are often made the same for clarity and consistency, but they don't have to be. Here's an example to illustrate the difference:

from collections import namedtuple

# Creating a namedtuple type
Person = namedtuple('Human', 'name age gender')

# Creating an instance of the namedtuple
p = Person(name="Evan", age=30, gender="Male")

print(p)  # Output will be "Human(name='Evan', age=30, gender='Male')"

python

In this example:

Person is the variable name we use to refer to the namedtuple type.
'Human' is the internal name of the namedtuple type, which shows up in its representation.

However, as a best practice, it's usually more straightforward to keep these names the same to avoid confusion.

Named tuples are a great way to make your code more readable and maintainable, especially when dealing with data structures that benefit from attribute-named access. They combine the simplicity of tuples with the readability of objects. I will definitely be using them more in my python code going forward. Happy coding!

Keep reading

PreviousNoSQL Primary Keys: Things to consider

NextHow Containers Interact with Their Host OS