Lecture 6 - Exception Coding, Modules and Packages

View notebook on Github Open In Collab

6.1 Exception Coding

Most Python codes contain errors when initially developed. For example, in the second cell below, in the print function we used the name Var2 instead of the name of the defined variable var2. When we tried to run the cell, we got a NameError, with the further description that name 'Var2' is not defined. This message is specific enough for us to realize that we used a name in the print function that is different than the name we defined.

[1]:
var1 = 5
print(var1)
5
[2]:
var2 = 6
print(Var2)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 2
      1 var2 = 6
----> 2 print(Var2)

NameError: name 'Var2' is not defined

Errors detected during code execution are called exceptions. In this example, NameError is an exception. When an exception occurs, the Python interpreter terminates the program, and an error is displayed.

Errors in Python are displayed in a specific form that provides: the traceback, the type of the exception, and the error message. Traceback is the sequence of function calls that led to the error. In the above example, the arrow indicates that the exception occurred in line 2 of the cell. This example is extremely simple, and in actual programs the traceback will list all modules and functions which led to the exception. Most often, you can just pay attention to the last level in the traceback, which is the actual place where the error occurred.

The try/except/else Statement

To handle exceptions in our programs, we can use try and except. This is also known as catching the exception. In the following cell, the code that can cause an exception to occur is indented under the try header, and the NameError is listed after except. If the exception occurs, the block indented under except is executed. Notice that this time the cell ran despite the error in our code, and we only printed a statement.

[3]:
try:
    var2 = 6
    print(Var2)
except NameError:
    print('Oops, something went wrong!')
Oops, something went wrong!

If the try part succeeds (i.e., there are no errors in the block of indented statements under try), then the except part is not executed.

[4]:
try:
    var2 = 6
    print(var2)
except NameError:
    print('Oops, something went wrong!')
6

Similarly, if we try adding an integer number and a string, this will result in a TypeError.

[5]:
123 + 'abc'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 123 + 'abc'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

We can again use try and except to catch this exception, only this time we use TypeError instead of NameError after the except keyword.

[6]:
try:
    123 + 'abc'
except TypeError:
     print('Oops, something went wrong!')
Oops, something went wrong!

What if we used the NameError in the except statement instead of TypeError? The result is shown below. The exception was not caught this time, because when Python executes the try block, it tries to match the exception type with those listed in the except clause. This means that we always need to use the correct exception type in order to be caught.

[7]:
try:
    123 + 'abc'
except NameError:
     print('Oops, something went wrong!')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 try:
----> 2     123 + 'abc'
      3 except NameError:
      4      print('Oops, something went wrong!')

TypeError: unsupported operand type(s) for +: 'int' and 'str'

But then, what if we are not sure about the type of exception that we expect to occur in our code? One solution is to except for both NameError and TypeError. This way, we can catch either a NameError or a TypeError exception.

[8]:
try:
    123 + 'abc'
except NameError:
    print('Oops, wrong name error!')
except TypeError:
    print('Oops, wrong type error!')
Oops, wrong type error!
[9]:
try:
    var2 = 6
    print(Var2)
except NameError:
    print('Oops, wrong name error!')
except TypeError:
    print('Oops, wrong type error!')
Oops, wrong name error!

Python allows to insert multiple except statements under a single try statement for catching different exception types.

It is also possible to catch any of multiple exceptions by providing a tuple of exception types after the except keyword, as shown in the following example.

[10]:
try:
    123 + 'abc'
except (NameError, TypeError):
    print('Oops, wrong name or wrong type error!')
Oops, wrong name or wrong type error!

When there are multiple except statements, the try block is executed line by line until the first matching exception is caught. In this example, that is the NameError exception, and the print line under except NameError is executed. The error in the line 123 + 'abc' is not caught because the execution of the try block is interrupted after the first exception is detected.

[11]:
try:
    var2 = 6
    print(Var2)
    123 + 'abc'
except TypeError:
    print('Oops, wrong type error!')
except SyntaxError:
    print('Oops, wrong syntax error!')
except NameError:
    print('Oops, wrong name error!')
except (IndexError, IndentationError):
    print('Oops, wrong index or indentation error!')
Oops, wrong name error!

Another alternative is to write only except without specifying any exception type. An empty except clause will catch all exception types, and with that, we don’t need to list the expected error types in the code.

[12]:
try:
    var2 = 6
    print(Var2)
except:
    print('Oops, something went wrong!')
Oops, something went wrong!

Despite this convenience, it is not generally recommended to use the empty except statement very often. One reason is that in the previous example we will only know that something was wrong with our code, but we won’t know what caused the error. This makes fixing the program difficult. In addition, the empty except statement can also catch some system errors that are not related to our code (such as system exit, Ctrl+C interrupt). And even worse, it may also catch genuine programming mistakes in our code for which we probably want to see an error message.

Therefore, it is better to be specific about what types of exceptions we want to catch and where, instead of catching everything we can in the whole program.

Similarly, writing Exception after the except statement will catch all exceptions, and acts the same as an empty except clause. Differently from an empty except clause, the Exception statement does not catch system-related exceptions, and it is therefore somewhat preferred, but it should still be used with caution.

[13]:
try:
    var2 = 6
    print(Var2)
except Exception:
    print('Oops, something went wrong!')
Oops, something went wrong!

It is also possible to catch an exception and store it in a variable. In the following cell, we are catching an exception and storing it in the variable my_error.

[14]:
try:
    123 + 'abc'
except TypeError as var3:
    my_error = var3
[15]:
my_error
[15]:
TypeError("unsupported operand type(s) for +: 'int' and 'str'")

The syntax of the try/except statement can also include an optional else statement. The block of statements indented under else is executed if there is no exception caught in the try block.

In this example, there is an exception in the print(Var2) line, and because of that, the statement under except is executed.

[16]:
try:
    var2 = 6
    print(Var2)
    print('This message is not printed')
except NameError:
    print('Oops, something went wrong!')
else:
    print('The code is executed successfully, no exception occurred!')
Oops, something went wrong!

On the contrary, the following code does not raise an exception, and therefore, the statements under try are executed, and also, the statement under else is executed.

[17]:
try:
    var2 = 6
    print(var2)
    print('This message is printed')
except NameError:
    print('Oops, something went wrong!')
else:
    print('The code is executed successfully, no exception occurred!')
6
This message is printed
The code is executed successfully, no exception occurred!

The general syntax of the try/except/else statement is as shown below. It is a compound, multipart statement, that starts with a try header. It is followed by one or more except blocks, which identify exceptions to be caught and blocks to process them. The else statement is optional, and it is listed after the except blocks; the else block runs if no exceptions are encountered. The words try, except, and else should be indented to the same level (vertically aligned).

try:
   Place your operations here.
   ...
   ...
except ExceptionI:
   If there is ExceptionI, then execute this block.
except ExceptionII:
   If there is ExceptionII, then execute this block.
except (ExceptionIII, ExceptionIV):
    If there is ExceptionIII or ExceptionIV, then execute this block.
except ExceptionV as Var1:
    If there is ExceptionV, store it in the variable Var1, and then execute this block.
except:
    If there are any other exceptions, then execute this block.
   ...
   ...
else:
   If there is no exception, then execute this block.

The finally Statement

The finally statement is another statement that can be combined with try. The general syntax is shown below. The goal is to always execute the block of code indented under finally regardless of whether there was an exception in the try block or not.

try:
   Place your operations here
   ...
   ...
   Due to exceptions, these lines of code may be skipped.
finally:
   This code block is always executed, regardless of whether exceptions occurred.

The try/finally form is useful when we want to be completely sure that an action will happen after some code runs, without considering the exception behavior of the program. In practice, this allows to specify cleanup actions that must always occur, such as file closes or server disconnects.

The next example opens a file named testfile for writing, then writes some text, and closes the file. The code under finally is executed.

[18]:
try:
    f = open('testfile', 'w')
    f.write('First sentence, second sentence, end')
    f.close()
finally:
    print('The finally code block is always executed')
The finally code block is always executed

Then, this cell reads the file.

[19]:
try:
    f = open('testfile', 'r')
    print(f.read())
    f.close()
finally:
    print('The finally code block is always executed')
First sentence, second sentence, end
The finally code block is always executed

For practice, let’s make an intentional mistake and try to open a file for reading that does not exist. As expected, we got a FileNotFoundError, however the code under finally was still executed.

[20]:
try:
    f = open('wrongfile', 'r')
    print(f.read())
    f.close()
finally:
    print('The finally code block is always executed')
The finally code block is always executed
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[20], line 2
      1 try:
----> 2     f = open('wrongfile', 'r')
      3     print(f.read())
      4     f.close()

File ~\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:284, in _modified_open(file, *args, **kwargs)
    277 if file in {0, 1, 2}:
    278     raise ValueError(
    279         f"IPython won't let you open fd={file} by default "
    280         "as it is likely to crash IPython. If you know what you are doing, "
    281         "you can use builtins' open."
    282     )
--> 284 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'wrongfile'

The finally clause can also be combined with except and else. The logic remains the same, that is, the block under finally will always be executed. In this example the exception is caught, and the print statement under except and finally are displayed.

[21]:
try:
    f = open('wrongfile', 'r')
    print(f.read())
    f.close()
except FileNotFoundError:
    print('Oops, there is no such file')
finally:
    print('The finally code block is always executed')
Oops, there is no such file
The finally code block is always executed

Error Types

Besides the above types NameError and TypeError, let’s briefly look at several other common error types in Python.

SyntaxError occurs when there is a problem with the structure of the code in the program (e.g., EOL below stands for End-Of-Line error, meaning that we forgot the single quote at the end of the string in this example). IndexError points to wrong indexing of sequences. IndentationError, FileEror, ZeroDivisionError are self-explanatory.

[22]:
print('Hello world)
  Cell In[22], line 1
    print('Hello world)
          ^
SyntaxError: unterminated string literal (detected at line 1)

[23]:
list1 = [1, 2, 3]
list1[10]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[23], line 2
      1 list1 = [1, 2, 3]
----> 2 list1[10]

IndexError: list index out of range
[24]:
def func1():
    msg = 'Hello world'
    print(msg)
     return msg
  Cell In[24], line 4
    return msg
    ^
IndentationError: unexpected indent

[25]:
myfile = open('newfile.txt', 'r')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[25], line 1
----> 1 myfile = open('newfile.txt', 'r')

File ~\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:284, in _modified_open(file, *args, **kwargs)
    277 if file in {0, 1, 2}:
    278     raise ValueError(
    279         f"IPython won't let you open fd={file} by default "
    280         "as it is likely to crash IPython. If you know what you are doing, "
    281         "you can use builtins' open."
    282     )
--> 284 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'newfile.txt'
[26]:
10/0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[26], line 1
----> 1 10/0

ZeroDivisionError: division by zero
[27]:
# The zero division error is not detected because the indentation error was first detected and the line didn't run
x = 0
 print(10/x)
  Cell In[27], line 3
    print(10/x)
    ^
IndentationError: unexpected indent

All exceptions in Python are shown below. Detailed explanations about each exception can be found here.

Note that the exceptions have a hierarchy, where for instance, catching an ArithmeticError exception will catch everything that is under it in the tree, i.e., FloatingPointError, OverflowError, and ZeroDivisionError. There are also a few exceptions that are not in this tree, like SystemExit and KeyboardInterrupt, but most of the time we shouldn’t catch these exceptions.

Exception
├── ArithmeticError
│   ├── FloatingPointError
│   ├── OverflowError
│   └── ZeroDivisionError
├── AssertionError
├── AttributeError
├── BufferError
├── EOFError
├── ImportError
├── LookupError
│   ├── IndexError
│   └── KeyError
├── MemoryError
├── NameError
│   └── UnboundLocalError
├── OSError
│   ├── BlockingIOError
│   ├── ChildProcessError
│   ├── ConnectionError
│   │   ├── BrokenPipeError
│   │   ├── ConnectionAbortedError
│   │   ├── ConnectionRefusedError
│   │   └── ConnectionResetError
│   ├── FileExistsError
│   ├── FileNotFoundError
│   ├── InterruptedError
│   ├── IsADirectoryError
│   ├── NotADirectoryError
│   ├── PermissionError
│   ├── ProcessLookupError
│   └── TimeoutError
├── ReferenceError
├── RuntimeError
│   └── NotImplementedError
├── StopIteration
├── SyntaxError
│   └── IndentationError
│       └── TabError
├── SystemError
├── TypeError
├── ValueError
│   └── UnicodeError
│       ├── UnicodeDecodeError
│       ├── UnicodeEncodeError
│       └── UnicodeTranslateError
└── Warning
    ├── BytesWarning
    ├── DeprecationWarning
    ├── FutureWarning
    ├── ImportWarning
    ├── PendingDeprecationWarning
    ├── ResourceWarning
    ├── RuntimeWarning
    ├── SyntaxWarning
    ├── UnicodeWarning
    └── UserWarning

The raise Statement

In Python, we can also trigger exceptions and create error messages manually. This is known as raising an exception, and it is coded with the raise keyword followed by the exception and an optional error message.

The general syntax is as follows:

if test_condition:
    raise Exception(Message)

In the following example, we raise an exception and stop the program if x is less than 0. In the parentheses, we specified the text that is to be displayed in the error message.

[28]:
x = -1

if x < 0:
    raise Exception('Sorry, no numbers below zero')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[28], line 4
      1 x = -1
      3 if x < 0:
----> 4     raise Exception('Sorry, no numbers below zero')

Exception: Sorry, no numbers below zero

We can also define the type of exception to raise after the raise keyword, such as TypeError in the next example.

[29]:
y = 'hello'

type(y)
[29]:
str
[30]:
if type(y) is not int:
    raise TypeError('Only integers are allowed')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[30], line 2
      1 if type(y) is not int:
----> 2     raise TypeError('Only integers are allowed')

TypeError: Only integers are allowed

When an exception is not raised, the indented block under raise is not executed.

[31]:
y = 3

if type(y) is not int:
    raise TypeError('Only integers are allowed')

We can also create custom exceptions ahead of time, and use them afterward in our code.

[32]:
my_exception = TypeError('Sorry, the input should be an integer number')

z = 'one'

if type(z) is not int:
    raise my_exception
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[32], line 6
      3 z = 'one'
      5 if type(z) is not int:
----> 6     raise my_exception

TypeError: Sorry, the input should be an integer number

In the above cell, my_exception is in fact an instance of the class TypeError. The error message that we typed above is an attribute of the created instance of TypeError class.

Additionally, the raise statement can be used alone, without an exception name. In that case, it simply reraises the current exception. This form is typically used if we need to catch and handle an exception, but don’t want the exception to be hidden and terminated in the code.

Consider the following example where except catches a ZeroDivisionError.

[33]:
a = 10
b = 0
try:
    print(a/b)
except ZeroDivisionError:
    print('Oops, something went wrong')
Oops, something went wrong

Including the raise statement alone at the end of the code causes the exception to be reraised.

[34]:
a = 10
b = 0
try:
    print(a/b)
except ZeroDivisionError:
    print('Oops, something went wrong')
    raise
Oops, something went wrong
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[34], line 4
      2 b = 0
      3 try:
----> 4     print(a/b)
      5 except ZeroDivisionError:
      6     print('Oops, something went wrong')

ZeroDivisionError: division by zero

Here is another example, where the function quad is used for calculating the roots of a quadratic function with coefficients a, b, and c. The user-defined QuadError raises an exception if the function is not quadratic, or if it does not have real roots. The raise statement allows us to introduce application-specific errors in our codes.

[35]:
import math
class QuadError(Exception): pass

def quad(a, b, c):
    if a == 0:
        raise QuadError('Not quadratic')

    if b*b-4*a*c < 0:
        raise QuadError('No real roots')

    x1 = (-b+math.sqrt(b*b-4*a*c))/(2*a)
    x2 = (-b-math.sqrt(b*b-4*a*c))/(2*a)

    return (x1, x2)
[36]:
x1, x2 = quad(3, 4, 4)
print("Roots are", x1, x2)
---------------------------------------------------------------------------
QuadError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 x1, x2 = quad(3, 4, 4)
      2 print("Roots are", x1, x2)

Cell In[35], line 9, in quad(a, b, c)
      6     raise QuadError('Not quadratic')
      8 if b*b-4*a*c < 0:
----> 9     raise QuadError('No real roots')
     11 x1 = (-b+math.sqrt(b*b-4*a*c))/(2*a)
     12 x2 = (-b-math.sqrt(b*b-4*a*c))/(2*a)

QuadError: No real roots
[37]:
x1, x2 = quad(1, -5, 6)
print("Roots are", x1, x2)
Roots are 3.0 2.0
[38]:
x1, x2 = quad(0, -5, 6)
print("Roots are", x1, x2)
---------------------------------------------------------------------------
QuadError                                 Traceback (most recent call last)
Cell In[38], line 1
----> 1 x1, x2 = quad(0, -5, 6)
      2 print("Roots are", x1, x2)

Cell In[35], line 6, in quad(a, b, c)
      4 def quad(a, b, c):
      5     if a == 0:
----> 6         raise QuadError('Not quadratic')
      8     if b*b-4*a*c < 0:
      9         raise QuadError('No real roots')

QuadError: Not quadratic

The assert Statement

The assert statement is similar to the raise statement, and it can be thought of as a conditional raise statement.

The general syntax is:

assert test_condition, message(optional)

If the test condition evaluates to False, Python raises an AssertionError exception. If the message item is provided, it is used as the error message in the displayed exception.

Conversely, if the test condition evaluates to True, the program will continue to the next line and will do nothing.

Like all exceptions, the AssertionError exception will terminate the program if it’s not caught with a try statement.

In the following example, assert is used to ensure that the values for the Temperature are non-negative.

[39]:
def KelvinToFahrenheit(Temperature):
    assert Temperature >= 0, 'Colder than absolute zero!'
    return ((Temperature-273)*1.8)+32
[40]:
KelvinToFahrenheit(273)
[40]:
32.0
[41]:
KelvinToFahrenheit(-10)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[41], line 1
----> 1 KelvinToFahrenheit(-10)

Cell In[39], line 2, in KelvinToFahrenheit(Temperature)
      1 def KelvinToFahrenheit(Temperature):
----> 2     assert Temperature >= 0, 'Colder than absolute zero!'
      3     return ((Temperature-273)*1.8)+32

AssertionError: Colder than absolute zero!

The equivalent assert code of the next cell using the raise statement is shown in the cell below.

[42]:
x = -1

if x < 0:
    raise Exception('Sorry, no numbers below zero')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[42], line 4
      1 x = -1
      3 if x < 0:
----> 4     raise Exception('Sorry, no numbers below zero')

Exception: Sorry, no numbers below zero
[43]:
x = -3

assert x >= 0, 'Sorry, no numbers below zero'
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[43], line 3
      1 x = -3
----> 3 assert x >= 0, 'Sorry, no numbers below zero'

AssertionError: Sorry, no numbers below zero

Here is one more simple example, where assert is used to ensure that no empty lists are passed to marks.

[44]:
def average(marks):
    assert len(marks) != 0, 'List is empty'
    return sum(marks)/len(marks)
[45]:
marks1 = [55, 88, 78, 90, 79]
print('Average of marks is:', average(marks1))
Average of marks is: 78.0
[46]:
marks2 = []
print('Average of marks is:', average(marks2))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[46], line 2
      1 marks2 = []
----> 2 print('Average of marks is:', average(marks2))

Cell In[44], line 2, in average(marks)
      1 def average(marks):
----> 2     assert len(marks) != 0, 'List is empty'
      3     return sum(marks)/len(marks)

AssertionError: List is empty

Assert is typically used to verify program conditions during development (such as user-defined constraints), rather than for catching genuine programming errors. Because Python catches programming errors itself, there is usually no need to use assert to catch things like zero divides, out-of-bounds indexes, type mismatches, etc. In general, assertions are useful for checking types, classes, or values of inputted variables, checking data structures such as duplicates in a list or contradictory variables, and checking that outputs of functions are reasonable and as expected.

6.2 Module Coding Basics

Every Python file with code is referred to as a module. To create modules, we don’t need to write special syntax to tell Python that we are making a module. We can simply use any text editor to type Python code into a text file, and save it with a .py extension; any such file is automatically considered a Python module.

For example, I have created a simple file called my_module.py that is saved in the same directory as this Jupyter notebook. The module does not do anything useful, it just defines a few names and prints a few statements. The code inside my_module.py is shown below.

e8fe96d10f4e4864953e48739867e2b7

Similar to the rules for naming other variables in Python, module names should follow the same rules and can contain only letters, digits, and underscores. The module names cannot use Python-reserved keywords (e.g., such as a module file named if.py.)

The import Statement

Python programs can use the modules file we have created by running an import or from statement. These statements find, compile, and run a module file’s code. The main difference is that import fetches the module as a whole, while from fetches specific names out of the module.

Let’s import my_module. Python executes the statements in the module file one after another, from the top of the file to the bottom. For this module, the two print statements at the top level of the file are executed. The print statements inside the two functions (main_report and sub_report) are not executed; they will be executed only when the functions sub_report and main_report are called.

[47]:
import my_module
I am inside my_module
The value of the variable X is: 3

Note that we don’t use the .py extension for the files with the import statement (i.e., import my_module.py will raise an exception).

When the module is imported, a new module object is created. The module object is shown below, where Python mapped the module name to an external filename by adding a directory path from the module search path to the file, and a .py extension at the end.

[48]:
# The name my_module references to the loaded module object
my_module
[48]:
<module 'my_module' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\my_module.py'>

Overall, the name my_module serves two different purposes: 1. It identifies the external file my_module.py that needs to be loaded. 2. After the module is loaded, it becomes a reference to the module object.

During importing, all the names assigned at the top level of the module become attributes of the module object. In this example, the variables X and Y and the functions sub-report and main_report become attributes of the module, and we can call them by using the object.attribute syntax (a.k.a. qualification).

[49]:
my_module.X
[49]:
3
[50]:
my_module.Y
[50]:
5
[51]:
my_module.sub_report()
The value of the variable Z is: 8
I am a function named sub_report

The from Statement

The from statement fetches specific names from the module, and allows to use the names directly (without the need for module_object.attribute). This way, we can call the names in the module with less typing.

[52]:
from my_module import X
X
[52]:
3

The from statement in effect copies the names out of the module into another scope; in this case, in the scope of this Jupyter notebook, where the from statement appears.

When we run a from statement, internally Python first imports the entire module file as usual, then copies the specific names out of the module file, and finally, it deletes the module file. This is similar to the following code:

import my_module
X = module.X
del my_module

With from, we can also import several names at the same time, separated by commas.

[53]:
from my_module import X, Y, sub_report
[54]:
sub_report()
The value of the variable Z is: 8
I am a function named sub_report

Another alternative is to use a * instead of specific names, which fetches all names assigned at the top level of the referenced module. The following code fetches all four names in our module: X, Y, sub_report, and main_report. Note again that the names Z and U are not defined at the top level in the module, but are enclosed in the functions, and therefore, they can not be fetched with the import statement.

[55]:
from my_module import *
main_report()
The value of the variable U is: 10
I am a function named main_report
[56]:
U
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[56], line 1
----> 1 U

NameError: name 'U' is not defined

One problem with using from module import * is that it can silently overwrite variables that have the same name as existing variables in our scope.

In the following example, we have a variable X = 15, which was overwritten by the variable X with the same name in my_module which has the value 3. The way this variable was overwritten may not be obvious (e.g., in large modules with many variables we cannot remember and keep track of all variable names).

[57]:
X = 15
from my_module import *
print(X)
3

On the other hand, if we use import, all names will be defined only within the scope of the module, and the names will not collide with other names in our programs.

[58]:
X = 15
import my_module
# The print statements this time were not displayed (the reason why it so is explained in the Appendix)
[59]:
print(X)
print(my_module.X)
15
3

Therefore, programmers need to be careful when using the from statement (especially with *), and the import only statement should be preferred. However, from also provides convenience of less typing, and it is still very commonly used.

When Using import is Required

When the same name of a variable or function is defined in two different modules, and we need to use both of the names at the same time, then we must use the import statement.

For instance, let’s assume that another module file named module_no_2.py also contains a variable X and a function main_report.

5c18bdc0e53d49a5aeadaa4d40df0552

Using import we can load the two different variables X, because including the name of the enclosing module makes the two names unique.

[60]:
import my_module # when a module is imported the first time, it is executed
import module_no_2 # when a module is imported afterward, it is not executed
I am inside module_no_2
The value of the variable X is: 22
[61]:
print(my_module.X)
print(module_no_2.X)
3
22

The same holds for the function main_report which appears in both modules.

[62]:
my_module.main_report()
module_no_2.main_report()
The value of the variable U is: 10
I am a function named main_report
The value of the variable Y is: 15
I am a function named main_report

In this case, the from statement will fail because we can have only one assignment to the name X in the scope.

[63]:
# Only one variable name X can exist at one time
from my_module import X
from module_no_2 import X
print(X)
22

Another way to resolve the name clashing problem is to use the as extension to from/import that allows to import a name under another name that will be used as a synonym.

[64]:
from my_module import X as X1
from module_no_2 import X as X2
print(X1)
print(X2)
3
22

Module Namespaces

Modules can be understood as places where collections of names are defined that we want to make visible to the rest of our code. These collections of names live in the module’s namespace and represent the attributes of the module object.

To access the namespace of my_module object, we can use the built-in dir method. We can notice the names we assigned to the module file: X, Y, main_report, and sub_report. However, Python also adds some names in the module’s namespace for us; for instance, __file__ gives the path to the file the module was loaded from, and __name__ gives the module name.

[65]:
dir(my_module)
[65]:
['X',
 'Y',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'main_report',
 'sub_report']

Internally, the module namespaces created by imports are stored as dictionary objects. Module namespaces can also be accessed through the built-in __dict__ attribute associated with module objects, where the names are dictionary keys.

[66]:
my_module.__dict__.keys()
[66]:
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__file__', '__cached__', '__builtins__', 'X', 'Y', 'sub_report', 'main_report'])
[67]:
my_module.__dict__['__file__']
[67]:
'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\my_module.py'
[68]:
my_module.__dict__['__name__']
[68]:
'my_module'

6.3 Module Packages

When we create programs in Python, it is helpful to organize the individual module files related to an application into sub-directories. A directory of Python code is said to be a package or modules package. Importing a directory is known as a package import.

For example, consider the directory MyMainPackage which is located in the same directory as this Jupyter notebook.

MyMainPackage
    ├── __init__.py
    ├── main_script
    ├── MySubPackage
    │   ├── __init__.py
    │   ├── sub_script.py

To import the module file sub_script.py which is located inside the directory MySubPackage, we can use the dotted syntax shown in the following cell MyMainPackage.MySubPackage.sub_script. In effect, this turns the directory MyMainPackage into a Python namespace, which has attributes corresponding to the sub-directories and module files that the directory contains.

a466e7f5de9749e5a87dd19f79ba2282

[69]:
import MyMainPackage.MySubPackage.sub_script
I am inside sub_script, which is located in MySubPackage
The value of the variable X is: 23

As we learned previously, import fetches a module as a whole, and the names (variables and functions) that are defined in the module sub_script.py become attributes of the imported object. These include the variable X and the function sub_report.

[70]:
MyMainPackage.MySubPackage.sub_script.X
[70]:
23
[71]:
MyMainPackage.MySubPackage.sub_script.sub_report()
I am a function inside sub_script
The value of the variable Y is: 5

The dotted path in the cell corresponds to the path through the directory hierarchy that leads to the module file sub_script.py, i.e., MyMainPackage\MySubPackage\sub_script.py.

On the other hand, note that syntax with backward slashes does not work with the import statement.

[72]:
import MyMainPackage\MySubPackage\sub_script
  Cell In[72], line 1
    import MyMainPackage\MySubPackage\sub_script
                         ^
SyntaxError: unexpected character after line continuation character

Similarly to import and from statements with modules, to fetch specific names from the sub_script.py module, we can use the from statement with packages as well.

[73]:
from MyMainPackage.MySubPackage.sub_script import X
[74]:
X
[74]:
23
[75]:
from MyMainPackage.MySubPackage.sub_script import sub_report
[76]:
sub_report()
I am a function inside sub_script
The value of the variable Y is: 5

Package __init__.py Files

When using package imports, there is one more constraint that we need to follow: each directory named within the path of a package import statement must contain a file named __init__.py. Otherwise, the package import will fail.

In the example we have been using, note that both MyMainPackage and MySubPacakge directories contain a file called __init__.py. The __init__.py names are special, as they declare that a directory is a Python package.

The __init__.py files are very often completely empty, and don’t contain any code. But, they can also contain Python code, just like other module files. In our MyMainPackage example, the __init__.py files are empty.

The __init__.pyfiles are run automatically the first time a Python program imports a directory. Because of that, __init__.py files can be used to store code to initialize the state required by files in a package (e.g., to create required data files, open connections to databases, and so on).

On a separate note, don’t confuse __init__.py files in module packages with the __init__() class constructor method that we used before for specifying attributes of class instances. Both have initialization roles, but they are otherwise very different.

Difference Between from and import with Packages

The import statement can be somewhat inconvenient to use with packages, because we may have to retype the paths to the files and sub-directories frequently in our program. In our example, we must retype and rerun the full path from MyMainPackage each time we want to reach the names in the sub_script.py file. Otherwise, we will get an error.

[77]:
sub_script.X
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[77], line 1
----> 1 sub_script.X

NameError: name 'sub_script' is not defined
[78]:
MySubPackage.sub_script.X
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[78], line 1
----> 1 MySubPackage.sub_script.X

NameError: name 'MySubPackage' is not defined
[79]:
MyMainPackage.MySubPackage.sub_script.X
[79]:
23
[80]:
# Use X in our code
print(MyMainPackage.MySubPackage.sub_script.X + 27)
print(MyMainPackage.MySubPackage.sub_script.X % 2)
print((MyMainPackage.MySubPackage.sub_script.X -13)/2)
50
1
5.0

It is often more convenient to use the from statement with packages to avoid retyping the paths at each access.

[81]:
from MyMainPackage.MySubPackage.sub_script import X
X
[81]:
23
[82]:
print(X + 27)
print(X % 2)
print((X - 13)/2)
50
1
5.0

In addition, if we ever restructure or rename the directory tree, the from statement requires just one path update in the code, whereas the import statement may require updates in many lines in the code.

However, import can be advantageous if there are two modules with the same name that are located in different directories, and are used in the same program. With the from statement, we can reach only one of the two modules at a time.

For example, in our MyMainPackage, there is a function sub_report in both the main_script and sub_script. If we use from statement, the name sub_report will change depending on whether it is imported from the main_script or the sub_script.

02b2e17f92644b69b58c6bb0a35e4289

4f0b121fa45844979e3415ea9b2f1d3f

[83]:
from MyMainPackage.MySubPackage.sub_script import sub_report
[84]:
sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
[85]:
from MyMainPackage.main_script import sub_report
I am inside main_script, which is located in MyMainPackage
The value of the variable X is: 12
[86]:
# Name collision with the sub_report name used in the cell above
sub_report()
I am a function inside main_script
The value of the variable Z is: 6

But, with the import statement, we can use either of the two functions sub_report, because their names will involve their full path, and this way, the names will not clash. The only inconvenience is that we need to type the full paths to the two functions.

[87]:
import MyMainPackage.MySubPackage.sub_script
MyMainPackage.MySubPackage.sub_script.sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
[88]:
import MyMainPackage.main_script
MyMainPackage.main_script.sub_report()
I am a function inside main_script
The value of the variable Z is: 6

Another alternative is to use the as extension, which will create unique synonyms for the names of the two functions. As we mentioned before, this extension is commonly used to provide short synonyms for longer names, and to avoid name clashes when we are already using a name in a script that would otherwise be overwritten by a regular import statement.

[89]:
from MyMainPackage.MySubPackage.sub_script import sub_report as sub_sub_report
sub_sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
[90]:
from MyMainPackage.main_script import sub_report as main_sub_report
main_sub_report()
I am a function inside main_script
The value of the variable Z is: 6

Appendix: Modules and Packages Extras

Modules Usage Modes: __name__ and __main__

We mentioned that each module has a built-in attribute called __name__, which Python assigns automatically to all module objects. The attribute is assigned as follows: - If the file is being imported by using the import statement, __name__ is set to the module’s name. - If the file is being run as a top-level program file, __name__ is set to the string __main__.

Let’s check it with an example. The module file module_no_3 is shown below, and note that in the first line we will print the assigned attribute __name__ to confirm that the above is correct.

ddd2d8c6bb224b709e7ed3fc9b1cf487

As expected, __name__ is assigned to module_no_3 when imported.

[91]:
# The module is imported
import module_no_3
Print the built-in attribute name of the module: module_no_3
The value of the variable X is: 1

When module_no_3 is run directly, __name__ is set to __main__.

[92]:
# The module is run by passing it as a command to the Python interpreter
!python module_no_3.py
Print the built-in attribute name of the module: __main__
The value of the variable X is: 1

Thus, the __name__ attribute can be used in the following if test if __name__ == '__main__' to determine whether it is being run or imported.

Therefore, if the module is the main script in a package and represents an entry point to a package that is run by the end-users (!python mainscript.py), the code after if __name__ == '__main__' in the main script will be executed when the main script is run. On the other hand, all other modules in the package will be imported. Any code under the if __name__ == '__main__' test in other modules will not be executed.

Another reason why using this is helpful is during code development for self-testing code that is written at the bottom of a file under the __name__ test. For instance, the file module_no_3a is similar to the file module_no_3, only that it includes several lines of code at the bottom, which test whether the function CelsiusToFahrenheit outputs expected values. When run as a command in the cell, the if __name__ == '__main__' is True, and the lines that test the outputs of the CelsiusToFahrenheit are run. Conversely, when the module file is imported, the various variables and functions are imported, but the if __name__ == '__main__' is False, and the lines that test the outputs of the CelsiusToFahrenheit are not run.

ad95da13dbad4120b0b00fbf73a2e62e

[93]:
!python module_no_3a.py
Print the built-in attribute name of the module: __main__
The value of the variable X is: 1
--------------------
Self-testing
100 degrees Fahrenheit is 37.77777777777778 degrees Celsius
32 degrees Fahrenheit is 0.0 degrees Celsius
0 degrees Fahrenheit is -17.77777777777778 degrees Celsius
[94]:
import module_no_3a
Print the built-in attribute name of the module: module_no_3a
The value of the variable X is: 1

The above code allows to test the logic in our code without having to retype everything at the notebook cell or at the interactive command line each time we edit the file. Besides, the output of the self-test call will not appear every time this file is imported from another file.

Functions defined in files with the __name__ test can be run as standalone functions, and they can also be reused in other programs.

Reloading Modules

As we have seen, when we import a module, the code is executed only once when the module is imported the first time. Subsequent imports use the already loaded module object without reloading or rerunning the file’s code.

To force a module’s code to be reloaded and rerun, you need to instruct Python to do so explicitly by calling the reload built-in function. The reload reruns a module file’s code and overwrites its existing namespace, rather than deleting the module object and re-creating it. Also, the reload function returns the module object at the output of the cell.

[95]:
import my_module
[96]:
from imp import reload
reload(my_module)
I am inside my_module
The value of the variable X is: 3
C:\Users\vakanski\AppData\Local\Temp\ipykernel_18208\2249891335.py:1: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  from imp import reload
[96]:
<module 'my_module' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\my_module.py'>

Reloading can help to examine a file, for instance, when we make changes to the file. In this case, since we use Jupyter notebooks, to import a file again after we have made some changes to the file we can just restart the kernel, which will allow us to import the file, without using reload.

Module Packages Reloading

Just like module files, an already imported directory needs to be passed to reload to force re-execution of the code. As shown, reload accepts a dotted path name to reload nested directories and files. Also, reload returns the module object in the displayed output of the cell.

[97]:
# Repeated import statements do not produce any output
import MyMainPackage.MySubPackage.sub_script
[98]:
from imp import reload
reload(MyMainPackage.MySubPackage.sub_script)
I am inside sub_script, which is located in MySubPackage
The value of the variable X is: 23
[98]:
<module 'MyMainPackage.MySubPackage.sub_script' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\MyMainPackage\\MySubPackage\\sub_script.py'>

Once imported, sub-script becomes a module object nested in the object MySubPackage, which in turn is nested in the object MyMainPackage.

Similarly, MySubPackage is a module object that is nested in the object MyMainPackage.

[99]:
MyMainPackage.MySubPackage
[99]:
<module 'MyMainPackage.MySubPackage' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\MyMainPackage\\MySubPackage\\__init__.py'>

Python Path

If the directory MyMainPackage is not in the current working directory, then it may need to be added to the Python search path. To do that, either add the full path to the directory to the PYTHONPATH variable (by setting the Environment Variables on Windows systems), or the path to the directory can be added to a .pth file. Note that if the package is a standard library directory of a built-in function (e.g., random, time, sys, os), or if it is located in the site-packages directory (where third-party libraries are installed), it will be automatically found by Python, and it does not need to be added to the Python search path.

Alternatively, the path to the directory can be manually added using sys.path (that is, the path attribute of the standard library module sys). For instance, I can examine the sys.path on my computer, as shown in the following cell. Since the sys.path is just a list of directories, we can manually add the path of the current working directory, by using the append to list method.

[100]:
import sys
sys.path
[100]:
['C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules',
 'C:\\Users\\vakanski\\anaconda3\\python311.zip',
 'C:\\Users\\vakanski\\anaconda3\\DLLs',
 'C:\\Users\\vakanski\\anaconda3\\Lib',
 'C:\\Users\\vakanski\\anaconda3',
 '',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32\\lib',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\Pythonwin']
[101]:
sys.path.append('C:\\Users\\Alex\\Desktop\\python\\Lecture 6 Module Packages')
[102]:
sys.path
# The appended path is listed last
[102]:
['C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules',
 'C:\\Users\\vakanski\\anaconda3\\python311.zip',
 'C:\\Users\\vakanski\\anaconda3\\DLLs',
 'C:\\Users\\vakanski\\anaconda3\\Lib',
 'C:\\Users\\vakanski\\anaconda3',
 '',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32\\lib',
 'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\Pythonwin',
 'C:\\Users\\Alex\\Desktop\\python\\Lecture 6 Module Packages']

Notice now that the directory MyMainPackage is now listed in the sys.path. However, this modified sys.path is temporary and it is valid only for the duration of the current session; the path is refreshed every time Jupyter Notebook is restarted, or the notebook kernel is shut down. On the other hand, the path configuration in PYTHONPATH is permanent, and it lives after the current session is terminated.

Package Relative Imports

To illustrate package relative imports in Python we will use the MyRelativeImportPackage which is similar to the MyMainPackage and contains several simple files.

MyRelativeImportPackage
    ├── __init__.py
    ├── relative_import_script_1
    ├── relative_import_script_2
    ├── relative_import_script_5
    ├── relative_import_script_6
    ├── script_1
    ├── script_2
    ├── script_3
    ├── script_4
    ├── MySubPackage
    │   ├── __init__.py
    │   ├── relative_import_script_3
    │   ├── relative_import_script_4
    │   ├── sub_script

When modules within a package need to import other names from other modules in the same package, it is still possible to use the full path syntax for importing, as we did in the above section. This is called an absolute import.

For instance, the relative_import_script_1.py in the first line imports script_1 by using the full name of the directory (i.e., from MyRelativeImportPackage import script_1).

6d84cbcdce5b4f1ba840d2e7bbc26e77

b05e9befc42c4c978c4c7ccf53a46c44

[103]:
import MyRelativeImportPackage.relative_import_script_1
I am inside script_1, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_1, which is located in MyMainPackage

However, package files can also make use of a special syntax to simplify import statements within the same package. Instead of directly using the full path to the directory, Python allows to use a leading dot . to refer to the current directory in the package.

Therefore, instead of using from MyRelativeImportPackage import script_1, we can use from . import script_1. This is implemented in the relative_import_script_2.py to import script_2.

This syntax is referred to as a relative import because the path to the module to be imported is related to the current directory in which the module that imports is located.

The convenience of using relative imports is that we don’t need to write the name or the path of the current directory.

901632a161e94950b448e09dd818d96f

76a76b08dfe040da975ace5e015e4ce6

[104]:
import MyRelativeImportPackage.relative_import_script_2
I am inside script_2, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_2, which is located in MyMainPackage

One more example is presented in the next cell, where the module relative_import_script_3.py is located in the directory MySubPackage and it imports the module sub_script which is located in the same directory by using the . syntax.

2416d07f388d4451bbcbf1f0da0616e4

ac915a275ad5465e82d57cf4730a31ae

[105]:
import MyRelativeImportPackage.MySubPackage.relative_import_script_3
I am inside sub_script, which is located in MySubPackage
--------------------
I am inside the relative_import_scipt_3, which is located in MySubPackage

If we use two dots syntax as in .., then a module can import another module that is located in its parent directory of the current package (i.e., the directory above). For example, the relative_import_script_4.py is located in the MySubPackage directory, and it uses from .. import script_3 to import the script_3 module that is located in the parent directory of MySubPackage, that is, MyMainPacakage.

b8bcb6d85a594a03b7edef0ada38327a

a02ab4417c2541a5af8a64a3edc4bebb

[106]:
import MyRelativeImportPackage.MySubPackage.relative_import_script_4
I am inside script_3, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_4, which is located in MySubPackage

On the other hand, if we tried to use only import script_3 instead of from . import script_3, this will fail. We must use the from dotted syntax to import modules located in the same package. This is illustrated in the example in the following cell.

4a5fe15c932a4d0896d4b4ecf474a3c0

[107]:
import MyRelativeImportPackage.relative_import_script_5
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[107], line 1
----> 1 import MyRelativeImportPackage.relative_import_script_5

File ~\Documents\Codes\2023 Codes\Python for Data Science Course\Lectures_2023\Theme_1-Python_Programming\Lecture_6-Exceptions,_Modules\Posted\Lecture_6-Exceptions,_Modules\MyRelativeImportPackage\relative_import_script_5.py:1
----> 1 import script_3
      3 print(20*'-')
      4 print('I am inside the relative_import_scipt_5, which is located in MyMainPackage')

ModuleNotFoundError: No module named 'script_3'

Another way to use the relative imports is shown in the relative_import_script_6.py module, where the syntax from .script_4 import X is used to import the name X from the script_4 module which is located in the same directory as the importer module. This way, we can import specific names from modules in the same package.

1817ad4f91ef4cf1b35f66580b30aea7

2256ee2359e1456c819b5775c18dd098

[108]:
import MyRelativeImportPackage.relative_import_script_6
I am inside script_4, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_6, which is located in MyMainPackage
The value of the variable X is 5

Absolute imports are often preferred because they are straightforward, and it is easy to tell exactly where the imported module or name is located, just by looking at the statement. But, they require more typing and writing full names and paths in the code.

One clear advantage of relative imports is that they are quite succinct, and they can turn a very long import statement into a simple and short statement. Relative imports can be messy, particularly for projects where the organization of the directories is likely to change. Relative imports are also not as readable as absolute ones, and it is not easy to tell the location of the imported names.

References

  1. Mark Lutz, “Learning Python,” 5-th edition, O-Reilly, 2013. ISBN: 978-1-449-35573-9.

  2. Pierian Data Inc., “Complete Python 3 Bootcamp,” codes available at: https://github.com/Pierian-Data/Complete-Python-3-Bootcamp.

BACK TO TOP