Reading and Writing

Jed Rembold & Fred Agbo

March 12, 2025

Announcements

  • Problem set #4 is due on Monday 17th at 10pm!
  • Read chapter 10 of the text for Friday’s class and for next week
  • Polling continues on this link here

Review!

Suppose I construct the below 2D array using list comprehension:

A = [[i+j for i in range(3)] for j in range(4)]

What would be the output of:

print([A[i][2] for i in range(len(A))])


  1. [0,1,2]
  2. [2,3,4]
  3. [2,3,4,5]
  4. [2,2,2,2]

Reading

  • Programs often need to work with collections of data that are too large to reasonably exist typed all out in the code
    • Easier to read in the values of a list from some external data file
  • A file is the generic name for any named collection of data maintained on some permanent storage media attached to a computer
  • Files can contain information encoded in many different ways
    • Most common is the text file
    • Contains character data like you’d find in a string

Strings vs Text Files

  • While strings and text files both store characters, there are some important differences:
    • The longevity of the data stored
      • The value of a string variable lasts only as long as the string exists, is not overridden, or is not thrown out when a function completes
      • Information in a text file exists until the file is deleted
    • How data is read in
      • You have access to all the characters in a string variable pretty much immediately
      • Data from text files is generally read in sequentially, starting from the beginning and proceeding until the end of the file is reached

Reading Text Files

  • The general approach for reading a text file is to first open the file and associate that file with a variable, commonly called its file handle

  • We will also use the with keyword to ensure that Python cleans up after itself (closes the file) when we are done with it (Many of us could use a with irl)

    with open(filename) as file_handle:
      # Code to read the file using the file_handle
  • Python gives you several ways to actually read in the data

    • read reads the entire file in as a string
    • readline or readlines reads a single line or lines from the file
    • read alongside splitlines gets you a list of line strings
    • Can use the file handle as an iterator to loop over

Entire file ⟶ String

  • The read method reads the entire file into a string, with includes newline characters (\n) to mark the end of lines

  • Simple, but can be cumbersome to work with the newline characters, and, for large files, it can take a large amount of memory

  • As an example, the file:

    One fish
    two fish
    red fish
    blue fish

    would get read as

"One fish\ntwo fish\nred fish\nblue fish"

Line by Line

  • Of the ways to read the file in a string at a time, using the file handler as an iterator and looping is probably best and certainly most flexible

  • Leads to code that looks like:

    with open(filename) as f:
        for line in f:
            # Do something with the line
  • Note that most strategies preserve the newline character, which you very likely do not want, so be ready to strip them out before doing more processing

Powers Combined

  • So long as your files are not gigantic, using read and then the splitlines method can be a good option

  • This does remove the newline characters, since it splits the string at them

    with open(filename) as f:
        lines = f.read().splitlines()
    # Then you can do whatever you want with the list of lines

Aren’t you Exceptional

  • When opening a file for reading, it is possible the file does not exist!
    • Python handles this (and many other potential errors that can arise) using a mechanism called exception handling
    • Common in other languages as well
  • An exception is a object that belongs to an overarching hierarchy of exception classes
    • Different classes/types for different purposes
    • File operations, for example, use the exception class IOError
  • If open encounters an error, it reports the error by raising an exception with IOError as its type.
    • Raising an exception generally immediately terminates your program, but sometimes that is undesirable

Ignore Yoda, there is a try

  • Python uses the try statement to indicate an interest in trying to handle a possible exception

  • In simplest form, the syntax is:

    try:
        # Code that may cause an exception
    except type_of_exception:
        # Code to handle that type of exception
  • type_of_exception here is the class name of the exception being handled

    • IOError for the file reading errors we are discussing
  • Any exceptions arising from within the try block or within functions called within the try block would be “caught” and the lower block of code run instead of terminating the program

Example: Requesting Existing File

  • As an example, the below function will repeatedly ask the user to supply a file name that actually exists.

  • It will not just immediately break should they give it an invalid filename!

    def get_existing_file(prompt="Input a filename: "):
        while True:
            filename = input(prompt)
            try:
                with open(filename):
                    return filename
            except IOError:
                print("That filename is invalid!")
  • If the open call succeeds, we immediately just return the filename, but if it fails due to a IOError, we display a message and then keep asking

Choosing Wisely

  • The Python package used to implement pgl.py also supports a mechanism to choose files interactively, made available through the filechooser.py library module.
  • filechooser.py exports two functions:
    • choose_input_file for selecting a file
    • choose_output_file for selecting a folder and filename to save a file to
  • Both open up a file dialog that lets you select/choose a file
    • Clicking Open or Save returns the full pathname of the file
    • Clicking Cancel returns an empty string
  • Using it thus looks something like:
filename = choose_input_file()
with open(filename) as f:
    # Code to read file

Writing Text Files

  • You can write text files using almost the same syntax as reading:

    with open(filename, mode) as file_handle:
        # Code to write the file using file_handle
  • Note the mode parameter to open here! Mode is a string which is either

    • "w" to write a new file (or overwrite an existing file)
    • "a" to append new contents to the end of an existing file
  • The file handler supports the methods:

    • .write(some_string) to write a string to the file
    • .writelines(iterable_of_strings) to write each iterable element to the file

Writing ASCII SINE

  • Suppose I wanted to try my hand at some ASCII art and fill a text file with a vertical oscillating sine wave
  • A sine wave generally looks like: \[ A + A \sin\left(\frac{2\pi}{T}x\right)\] where \(A\) is the amplitude of the wave and \(T\) the period of the wave, or how quickly it repeats
    • The extra \(A +\) out front is to push the wave over to the right, since we can’t really place negative characters
  • How can we put this together?

ASCII SINE Code


from math import sin, pi

def sine_file(filename, A, T, symbol, padding=" "):
    """ 
    Creates a new sine wave in the provided file with the provided amplitude (A),
    and period (T) with the indicated symbol at the end.

    Inputs:
        filename (string): the name of the file to write the art to
        A (int): the amplitude of the wave in terms of number of characters
        T (int): the period of the wave in terms of number of lines
        symbol (string): the symbol to place to mark the wave
        padding (string): what character to pad the left side of the wave with

    Outputs:
        None
    """

    def compute_symb_placement(A, T, x):
        """Computes where the symbol should be placed."""
        value = A * sin(2 * pi / T * x) + A
        return int(value) # to integer character placement

    def construct_line(placement, symbol, padding):
        """Constructs the line with the necessary padding and symbol at the end."""
        return padding * placement + symbol

    with open(filename, 'w') as fh:
        for x in range(10 * T): # write 10 periods worth of lines
            v = compute_symb_placement(A, T, x)
            line = construct_line(v, symbol, padding)
            fh.write(line + '\n') # need the newline character at the end!

if __name__ == '__main__':
    sine_file('sine_test.txt', A=30, T=50, symbol='X')
// reveal.js plugins