Sets

Jed Rembold & Fred Agbo

April 5, 2024

Announcements

  • Project ImageShop is due next week Monday at 10pm.
  • You are welcome to attend a guest talk today by Dr Devrim Bilgili related to Data Science
    • Lecture Topic: first lecture in Hypothesis Testing at Ford 202 (10-11 am)
    • Research Topic: SIZE AND SHAPE ANALYSIS OF SILICA (SIO2) AND GOLD (AU) NANOPARTICLES at Ford 101 (11:30-12:20 am)
  • Polling: https://www.polleverywhere.com/agbofred203

Mathematical Sets

  • A set is an unordered collection of distinct values.
    • digits = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
    • evens = 0, 2, 4, 6, 8
    • odds = 1, 3, 5, 7, 9
    • primes = 2, 3, 5, 7
    • squares = 0, 1, 4, 9
    • primary = red, green, blue
    • \(\mathbf{R}\) = x where x is a real number
    • \(\mathbf{Z}\) = x where x is an integer
    • \(\mathbf{N}\) = x where x is an integer >=0
  • The set with no elements is call the empty set (∅)

Pythonic Sets

  • Enclosed within squiggly brackets
  • No key-value pairs, just single values separated by commas
digits = { 0, 1, 2, 3, 4, 6, 7, 8, 9 }
squares = { 0, 1, 4, 9 }
primary = { "red", "green", "blue" }
  • Set elements must be immutable
  • Sets themselves are generally mutable
  • Can not create an empty set just using { }!
    • Python assumes this to be an empty dictionary!
    • Must instead use set().

Set Operations

  • The fundamental set operation is membership (∈)
    • 3 ∈ primes
    • 3 ∉ evens
    • red ∈ primary
    • -1 ∉ N
  • The union of the sets \(A\) and \(B\) (\(A \cup B\)) consists of all elements in either \(A\) or \(B\) or both.
  • The intersection of the sets \(A\) and \(B\) (\(A \cap B\)) consists of all elements in both \(A\) and \(B\).
  • The set difference of \(A\) and \(B\) (\(A - B\)) consists of all elements in \(A\) but not in \(B\).
  • The symmetric set difference of \(A\) and \(B\) (\(A\triangle B\)) consists of all elements in \(A\) or \(B\) but not in both.

Python Implementations

  • Python’s built-in implementation of sets supports all these same operations
  • Can either use appropriately named methods on sets or operators between sets
  • Membership 3 in primes
  • Union: A.union(B) A | B
  • Intersection A.intersection(B) A & B
  • Difference A.difference(B) A - B
  • Symmetric difference A.symmetric_difference(B) A ^ B

Venn Diagrams

  • A Venn Diagram is a graphical representation of a set which indicates common elements as overlapping areas
  • The following Venn diagrams illustrate the effect of the 4 primary set operations

image/svg+xml A B A ∪ B
image/svg+xml A B A ∪ B A B A ∩ B B A - B A

image/svg+xml A B A ∪ B A B A ∩ B
image/svg+xml A B A ∪ B A B A ∩ B A B A - B A B A ∆ B

Practice

If we have the following sets from earlier:

digits = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
evens = { 0, 2, 4, 6, 8 }
odds = { 1, 3, 5, 7, 9 }
primes = { 2, 3, 5, 7 }
squares = { 0, 1, 4, 9 }

What is the value of each of the following:

  • evens ∪ squares
  • odds ∩ primes
  • primes - evens
  • odds ∆ squares

Understanding Check

Looking at the same sets:

digits = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
evens = { 0, 2, 4, 6, 8 }
odds = { 1, 3, 5, 7, 9 }
primes = { 2, 3, 5, 7 }
squares = { 0, 1, 4, 9 }

What is the set resulting from: \[ (\text{primes} \cap \text{evens}) \cup (\text{odds}\cap\text{squares})\]

  1. { 1, 2, 9 }
  2. { 1, 3, 4, 5}
  3. { 0, 3, 4, 5, 7}

Set Relationships

  • Sets \(A\) and \(B\) are equal (\(A = B\)) if they have the same elements.
    • This would make them the same circles in a Venn diagram
  • Set \(A\) is a subset of \(B\) (\(A\subseteq B\)) if all the elements in \(A\) are also in \(B\).
    • This would mean that the circle for \(A\) would be entirely inside (or equal) to the circle of \(B\)
  • Set \(A\) is a proper subset of \(B\) (\(A\subset B\)) if \(A\) is a subset of \(B\) and the two sets are not equal

Informal Proofs

  • You can use Venn diagrams to justify different set identities
  • Example: Say you wanted to show that: \[ A - (B \cap C) = (A-B) \cup (A-C)\]

image/svg+xml A B C A B C B C A B C A B C A B C A

Python Set Methods

  • Can also use “set comprehension” to generate a set { x for x in range(0,100,2) }
Function Description
len(set) Returns the number of elements in a set
elem in set Returns True if elem is in the set
set.copy() Creates and returns a shallow copy of the set
set.add(elem) Adds the specified elem to the set
set.remove(elem) Removes the element from the set, raising a ValueError if it is missing
set.discard(elem) Removes the element from the set, doing nothing if it is missing

Why use sets?

Sets come up naturally in many situations
Many real-world applications involve unordered collections of unique elements, for which sets are the natural model.
Sets have a well-established mathematical foundation
If you can frame your application in terms of sets, you can rely on the various mathematical properties that apply to sets.
Sets can make it easier to reason about your program
One of the advantages of mathematical abstraction is that using it often makes it easy to think clearly and rigorously about what your program does.
Many important algorithms are described in terms of sets
If you look at websites that describe some of the most important algorithms in computer science, many of them base those descriptions in terms of set operations.

Representing Data

  • To use computation effectively, we frequently need to be able to represent real world data in a way that computers can easily work with
    • Real world data is often more complicated or nuanced than just “a list of numbers”
  • Python’s existing data structures are tools, which you can use to help represent certain ideas
    • Lists when you have sequential type data, wherein there is a logical ordering to the data in question (where position matters)
      • Example: GPA over the course of 4 years
    • Tuples or classes when you have elements that should be grouped together but which have no inherent ordering. Generally use tuples for simple records and write custom classes for more complex. Could potentially also use a dictionary.
      • Example: Student names in a class
    • Maps or dictionaries when you have specific keys corresponding to other values.
      • Example: Student grades

Tricky Data

  • Human readable data is not always the best machine readable data!
Name Class Q1 Mid Q3 Final
Sally Python A B B A
Jake Python B B B C
James Astro B B A
Lily Astro A A B
Ben Python C B B A
  • Storing the above in a 2D array would work, but would be frustrating to work with

A Computer Friendly Approach

  • Student grades are time ordered, so we could use a list for the grades
  • Each student has a corresponding sequence of grades (and students are unordered), so we could use a dictionary where student names are the keys and the list of grades the values
  • Each class corresponds to an unordered set of students. Could have another dictionary where the keys were the class names and the values were the dictionary of students/grades

Example Representation

{
    "Python": {
        "Sally": ["A", "B", "B", "A"],
        "Jake": ["B", "B", "B", "C"],
        "Ben": ["C", "B", "B", "A"]
    },
    "Astro": {
        "James": ["B", "B", "A"],
        "Lily": ["A", "A", "B"]
    }
}

Compound Structure Storage

  • Structures representing complicated data can often be large enough that you don’t want to store them within your program itself
  • We can put them in their own file, but reading them in with our current tools would be complicated
    • Current methods read in text, so we would need to then parse the text to identify what data structures we needed to create and what elements we needed to add
    • This is certainly possible, but is sometimes more overhead than what we would like
  • It can be useful then to store the data structure in a file in such a format that can be easily read into Python
// reveal.js plugins