Lecture 6 - Exception Coding, Modules and Packages¶
6.1 Exception Coding¶
Most Python codes contain errors when initially developed. For example, in the second cell below, in the print
function we used the name Var2
instead of the name of the defined variable var2
. When we tried to run the cell, we got a NameError
, with the further description that name 'Var2' is not defined
. This message is specific enough for us to realize that we used a name in the print function that is different than the name we defined.
[1]:
var1 = 5
print(var1)
5
[2]:
var2 = 6
print(Var2)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 2
1 var2 = 6
----> 2 print(Var2)
NameError: name 'Var2' is not defined
Errors detected during code execution are called exceptions. In this example, NameError
is an exception. When an exception occurs, the Python interpreter terminates the program, and an error is displayed.
Errors in Python are displayed in a specific form that provides: the traceback, the type of the exception, and the error message. Traceback is the sequence of function calls that led to the error. In the above example, the arrow indicates that the exception occurred in line 2 of the cell. This example is extremely simple, and in actual programs the traceback will list all modules and functions which led to the exception. Most often, you can just pay attention to the last level in the traceback, which is the actual place where the error occurred.
The try/except/else
Statement¶
To handle exceptions in our programs, we can use try
and except
. This is also known as catching the exception. In the following cell, the code that can cause an exception to occur is indented under the try
header, and the NameError
is listed after except
. If the exception occurs, the block indented under except
is executed. Notice that this time the cell ran despite the error in our code, and we only printed a statement.
[3]:
try:
var2 = 6
print(Var2)
except NameError:
print('Oops, something went wrong!')
Oops, something went wrong!
If the try
part succeeds (i.e., there are no errors in the block of indented statements under try
), then the except
part is not executed.
[4]:
try:
var2 = 6
print(var2)
except NameError:
print('Oops, something went wrong!')
6
Similarly, if we try adding an integer number and a string, this will result in a TypeError
.
[5]:
123 + 'abc'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 123 + 'abc'
TypeError: unsupported operand type(s) for +: 'int' and 'str'
We can again use try
and except
to catch this exception, only this time we use TypeError
instead of NameError
after the except
keyword.
[6]:
try:
123 + 'abc'
except TypeError:
print('Oops, something went wrong!')
Oops, something went wrong!
What if we used the NameError
in the except
statement instead of TypeError
? The result is shown below. The exception was not caught this time, because when Python executes the try
block, it tries to match the exception type with those listed in the except
clause. This means that we always need to use the correct exception type in order to be caught.
[7]:
try:
123 + 'abc'
except NameError:
print('Oops, something went wrong!')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 2
1 try:
----> 2 123 + 'abc'
3 except NameError:
4 print('Oops, something went wrong!')
TypeError: unsupported operand type(s) for +: 'int' and 'str'
But then, what if we are not sure about the type of exception that we expect to occur in our code? One solution is to except for both NameError
and TypeError
. This way, we can catch either a NameError
or a TypeError
exception.
[8]:
try:
123 + 'abc'
except NameError:
print('Oops, wrong name error!')
except TypeError:
print('Oops, wrong type error!')
Oops, wrong type error!
[9]:
try:
var2 = 6
print(Var2)
except NameError:
print('Oops, wrong name error!')
except TypeError:
print('Oops, wrong type error!')
Oops, wrong name error!
Python allows to insert multiple except
statements under a single try
statement for catching different exception types.
It is also possible to catch any of multiple exceptions by providing a tuple of exception types after the except
keyword, as shown in the following example.
[10]:
try:
123 + 'abc'
except (NameError, TypeError):
print('Oops, wrong name or wrong type error!')
Oops, wrong name or wrong type error!
When there are multiple except
statements, the try
block is executed line by line until the first matching exception is caught. In this example, that is the NameError
exception, and the print line under except NameError
is executed. The error in the line 123 + 'abc'
is not caught because the execution of the try
block is interrupted after the first exception is detected.
[11]:
try:
var2 = 6
print(Var2)
123 + 'abc'
except TypeError:
print('Oops, wrong type error!')
except SyntaxError:
print('Oops, wrong syntax error!')
except NameError:
print('Oops, wrong name error!')
except (IndexError, IndentationError):
print('Oops, wrong index or indentation error!')
Oops, wrong name error!
Another alternative is to write only except
without specifying any exception type. An empty except
clause will catch all exception types, and with that, we don’t need to list the expected error types in the code.
[12]:
try:
var2 = 6
print(Var2)
except:
print('Oops, something went wrong!')
Oops, something went wrong!
Despite this convenience, it is not generally recommended to use the empty except
statement very often. One reason is that in the previous example we will only know that something was wrong with our code, but we won’t know what caused the error. This makes fixing the program difficult. In addition, the empty except
statement can also catch some system errors that are not related to our code (such as system exit, Ctrl+C interrupt). And even worse, it may also catch genuine programming
mistakes in our code for which we probably want to see an error message.
Therefore, it is better to be specific about what types of exceptions we want to catch and where, instead of catching everything we can in the whole program.
Similarly, writing Exception
after the except
statement will catch all exceptions, and acts the same as an empty except
clause. Differently from an empty except
clause, the Exception
statement does not catch system-related exceptions, and it is therefore somewhat preferred, but it should still be used with caution.
[13]:
try:
var2 = 6
print(Var2)
except Exception:
print('Oops, something went wrong!')
Oops, something went wrong!
It is also possible to catch an exception and store it in a variable. In the following cell, we are catching an exception and storing it in the variable my_error
.
[14]:
try:
123 + 'abc'
except TypeError as var3:
my_error = var3
[15]:
my_error
[15]:
TypeError("unsupported operand type(s) for +: 'int' and 'str'")
The syntax of the try/except
statement can also include an optional else
statement. The block of statements indented under else
is executed if there is no exception caught in the try
block.
In this example, there is an exception in the print(Var2)
line, and because of that, the statement under except
is executed.
[16]:
try:
var2 = 6
print(Var2)
print('This message is not printed')
except NameError:
print('Oops, something went wrong!')
else:
print('The code is executed successfully, no exception occurred!')
Oops, something went wrong!
On the contrary, the following code does not raise an exception, and therefore, the statements under try
are executed, and also, the statement under else
is executed.
[17]:
try:
var2 = 6
print(var2)
print('This message is printed')
except NameError:
print('Oops, something went wrong!')
else:
print('The code is executed successfully, no exception occurred!')
6
This message is printed
The code is executed successfully, no exception occurred!
The general syntax of the try/except/else
statement is as shown below. It is a compound, multipart statement, that starts with a try
header. It is followed by one or more except
blocks, which identify exceptions to be caught and blocks to process them. The else
statement is optional, and it is listed after the except
blocks; the else
block runs if no exceptions are encountered. The words try
, except
, and else
should be indented to the same level (vertically
aligned).
try:
Place your operations here.
...
...
except ExceptionI:
If there is ExceptionI, then execute this block.
except ExceptionII:
If there is ExceptionII, then execute this block.
except (ExceptionIII, ExceptionIV):
If there is ExceptionIII or ExceptionIV, then execute this block.
except ExceptionV as Var1:
If there is ExceptionV, store it in the variable Var1, and then execute this block.
except:
If there are any other exceptions, then execute this block.
...
...
else:
If there is no exception, then execute this block.
The finally
Statement¶
The finally
statement is another statement that can be combined with try
. The general syntax is shown below. The goal is to always execute the block of code indented under finally
regardless of whether there was an exception in the try
block or not.
try:
Place your operations here
...
...
Due to exceptions, these lines of code may be skipped.
finally:
This code block is always executed, regardless of whether exceptions occurred.
The try/finally
form is useful when we want to be completely sure that an action will happen after some code runs, without considering the exception behavior of the program. In practice, this allows to specify cleanup actions that must always occur, such as file closes or server disconnects.
The next example opens a file named testfile
for writing, then writes some text, and closes the file. The code under finally
is executed.
[18]:
try:
f = open('testfile', 'w')
f.write('First sentence, second sentence, end')
f.close()
finally:
print('The finally code block is always executed')
The finally code block is always executed
Then, this cell reads the file.
[19]:
try:
f = open('testfile', 'r')
print(f.read())
f.close()
finally:
print('The finally code block is always executed')
First sentence, second sentence, end
The finally code block is always executed
For practice, let’s make an intentional mistake and try to open a file for reading that does not exist. As expected, we got a FileNotFoundError
, however the code under finally
was still executed.
[20]:
try:
f = open('wrongfile', 'r')
print(f.read())
f.close()
finally:
print('The finally code block is always executed')
The finally code block is always executed
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[20], line 2
1 try:
----> 2 f = open('wrongfile', 'r')
3 print(f.read())
4 f.close()
File ~\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:284, in _modified_open(file, *args, **kwargs)
277 if file in {0, 1, 2}:
278 raise ValueError(
279 f"IPython won't let you open fd={file} by default "
280 "as it is likely to crash IPython. If you know what you are doing, "
281 "you can use builtins' open."
282 )
--> 284 return io_open(file, *args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'wrongfile'
The finally
clause can also be combined with except
and else
. The logic remains the same, that is, the block under finally
will always be executed. In this example the exception is caught, and the print statement under except
and finally
are displayed.
[21]:
try:
f = open('wrongfile', 'r')
print(f.read())
f.close()
except FileNotFoundError:
print('Oops, there is no such file')
finally:
print('The finally code block is always executed')
Oops, there is no such file
The finally code block is always executed
Error Types¶
Besides the above types NameError
and TypeError
, let’s briefly look at several other common error types in Python.
SyntaxError
occurs when there is a problem with the structure of the code in the program (e.g., EOL below stands for End-Of-Line error, meaning that we forgot the single quote at the end of the string in this example). IndexError
points to wrong indexing of sequences. IndentationError
, FileEror
, ZeroDivisionError
are self-explanatory.
[22]:
print('Hello world)
Cell In[22], line 1
print('Hello world)
^
SyntaxError: unterminated string literal (detected at line 1)
[23]:
list1 = [1, 2, 3]
list1[10]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[23], line 2
1 list1 = [1, 2, 3]
----> 2 list1[10]
IndexError: list index out of range
[24]:
def func1():
msg = 'Hello world'
print(msg)
return msg
Cell In[24], line 4
return msg
^
IndentationError: unexpected indent
[25]:
myfile = open('newfile.txt', 'r')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[25], line 1
----> 1 myfile = open('newfile.txt', 'r')
File ~\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:284, in _modified_open(file, *args, **kwargs)
277 if file in {0, 1, 2}:
278 raise ValueError(
279 f"IPython won't let you open fd={file} by default "
280 "as it is likely to crash IPython. If you know what you are doing, "
281 "you can use builtins' open."
282 )
--> 284 return io_open(file, *args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'newfile.txt'
[26]:
10/0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[26], line 1
----> 1 10/0
ZeroDivisionError: division by zero
[27]:
# The zero division error is not detected because the indentation error was first detected and the line didn't run
x = 0
print(10/x)
Cell In[27], line 3
print(10/x)
^
IndentationError: unexpected indent
All exceptions in Python are shown below. Detailed explanations about each exception can be found here.
Note that the exceptions have a hierarchy, where for instance, catching an ArithmeticError
exception will catch everything that is under it in the tree, i.e., FloatingPointError
, OverflowError
, and ZeroDivisionError
. There are also a few exceptions that are not in this tree, like SystemExit
and KeyboardInterrupt
, but most of the time we shouldn’t catch these exceptions.
Exception
├── ArithmeticError
│ ├── FloatingPointError
│ ├── OverflowError
│ └── ZeroDivisionError
├── AssertionError
├── AttributeError
├── BufferError
├── EOFError
├── ImportError
├── LookupError
│ ├── IndexError
│ └── KeyError
├── MemoryError
├── NameError
│ └── UnboundLocalError
├── OSError
│ ├── BlockingIOError
│ ├── ChildProcessError
│ ├── ConnectionError
│ │ ├── BrokenPipeError
│ │ ├── ConnectionAbortedError
│ │ ├── ConnectionRefusedError
│ │ └── ConnectionResetError
│ ├── FileExistsError
│ ├── FileNotFoundError
│ ├── InterruptedError
│ ├── IsADirectoryError
│ ├── NotADirectoryError
│ ├── PermissionError
│ ├── ProcessLookupError
│ └── TimeoutError
├── ReferenceError
├── RuntimeError
│ └── NotImplementedError
├── StopIteration
├── SyntaxError
│ └── IndentationError
│ └── TabError
├── SystemError
├── TypeError
├── ValueError
│ └── UnicodeError
│ ├── UnicodeDecodeError
│ ├── UnicodeEncodeError
│ └── UnicodeTranslateError
└── Warning
├── BytesWarning
├── DeprecationWarning
├── FutureWarning
├── ImportWarning
├── PendingDeprecationWarning
├── ResourceWarning
├── RuntimeWarning
├── SyntaxWarning
├── UnicodeWarning
└── UserWarning
The raise
Statement¶
In Python, we can also trigger exceptions and create error messages manually. This is known as raising an exception, and it is coded with the raise
keyword followed by the exception and an optional error message.
The general syntax is as follows:
if test_condition:
raise Exception(Message)
In the following example, we raise an exception and stop the program if x
is less than 0. In the parentheses, we specified the text that is to be displayed in the error message.
[28]:
x = -1
if x < 0:
raise Exception('Sorry, no numbers below zero')
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[28], line 4
1 x = -1
3 if x < 0:
----> 4 raise Exception('Sorry, no numbers below zero')
Exception: Sorry, no numbers below zero
We can also define the type of exception to raise after the raise
keyword, such as TypeError
in the next example.
[29]:
y = 'hello'
type(y)
[29]:
str
[30]:
if type(y) is not int:
raise TypeError('Only integers are allowed')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[30], line 2
1 if type(y) is not int:
----> 2 raise TypeError('Only integers are allowed')
TypeError: Only integers are allowed
When an exception is not raised, the indented block under raise
is not executed.
[31]:
y = 3
if type(y) is not int:
raise TypeError('Only integers are allowed')
We can also create custom exceptions ahead of time, and use them afterward in our code.
[32]:
my_exception = TypeError('Sorry, the input should be an integer number')
z = 'one'
if type(z) is not int:
raise my_exception
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[32], line 6
3 z = 'one'
5 if type(z) is not int:
----> 6 raise my_exception
TypeError: Sorry, the input should be an integer number
In the above cell, my_exception
is in fact an instance of the class TypeError
. The error message that we typed above is an attribute of the created instance of TypeError
class.
Additionally, the raise
statement can be used alone, without an exception name. In that case, it simply reraises the current exception. This form is typically used if we need to catch and handle an exception, but don’t want the exception to be hidden and terminated in the code.
Consider the following example where except
catches a ZeroDivisionError
.
[33]:
a = 10
b = 0
try:
print(a/b)
except ZeroDivisionError:
print('Oops, something went wrong')
Oops, something went wrong
Including the raise
statement alone at the end of the code causes the exception to be reraised.
[34]:
a = 10
b = 0
try:
print(a/b)
except ZeroDivisionError:
print('Oops, something went wrong')
raise
Oops, something went wrong
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[34], line 4
2 b = 0
3 try:
----> 4 print(a/b)
5 except ZeroDivisionError:
6 print('Oops, something went wrong')
ZeroDivisionError: division by zero
Here is another example, where the function quad
is used for calculating the roots of a quadratic function with coefficients a
, b
, and c
. The user-defined QuadError
raises an exception if the function is not quadratic, or if it does not have real roots. The raise
statement allows us to introduce application-specific errors in our codes.
[35]:
import math
class QuadError(Exception): pass
def quad(a, b, c):
if a == 0:
raise QuadError('Not quadratic')
if b*b-4*a*c < 0:
raise QuadError('No real roots')
x1 = (-b+math.sqrt(b*b-4*a*c))/(2*a)
x2 = (-b-math.sqrt(b*b-4*a*c))/(2*a)
return (x1, x2)
[36]:
x1, x2 = quad(3, 4, 4)
print("Roots are", x1, x2)
---------------------------------------------------------------------------
QuadError Traceback (most recent call last)
Cell In[36], line 1
----> 1 x1, x2 = quad(3, 4, 4)
2 print("Roots are", x1, x2)
Cell In[35], line 9, in quad(a, b, c)
6 raise QuadError('Not quadratic')
8 if b*b-4*a*c < 0:
----> 9 raise QuadError('No real roots')
11 x1 = (-b+math.sqrt(b*b-4*a*c))/(2*a)
12 x2 = (-b-math.sqrt(b*b-4*a*c))/(2*a)
QuadError: No real roots
[37]:
x1, x2 = quad(1, -5, 6)
print("Roots are", x1, x2)
Roots are 3.0 2.0
[38]:
x1, x2 = quad(0, -5, 6)
print("Roots are", x1, x2)
---------------------------------------------------------------------------
QuadError Traceback (most recent call last)
Cell In[38], line 1
----> 1 x1, x2 = quad(0, -5, 6)
2 print("Roots are", x1, x2)
Cell In[35], line 6, in quad(a, b, c)
4 def quad(a, b, c):
5 if a == 0:
----> 6 raise QuadError('Not quadratic')
8 if b*b-4*a*c < 0:
9 raise QuadError('No real roots')
QuadError: Not quadratic
The assert
Statement¶
The assert
statement is similar to the raise
statement, and it can be thought of as a conditional raise
statement.
The general syntax is:
assert test_condition, message(optional)
If the test condition
evaluates to False, Python raises an AssertionError
exception. If the message
item is provided, it is used as the error message in the displayed exception.
Conversely, if the test condition
evaluates to True, the program will continue to the next line and will do nothing.
Like all exceptions, the AssertionError
exception will terminate the program if it’s not caught with a try
statement.
In the following example, assert
is used to ensure that the values for the Temperature
are non-negative.
[39]:
def KelvinToFahrenheit(Temperature):
assert Temperature >= 0, 'Colder than absolute zero!'
return ((Temperature-273)*1.8)+32
[40]:
KelvinToFahrenheit(273)
[40]:
32.0
[41]:
KelvinToFahrenheit(-10)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[41], line 1
----> 1 KelvinToFahrenheit(-10)
Cell In[39], line 2, in KelvinToFahrenheit(Temperature)
1 def KelvinToFahrenheit(Temperature):
----> 2 assert Temperature >= 0, 'Colder than absolute zero!'
3 return ((Temperature-273)*1.8)+32
AssertionError: Colder than absolute zero!
The equivalent assert
code of the next cell using the raise
statement is shown in the cell below.
[42]:
x = -1
if x < 0:
raise Exception('Sorry, no numbers below zero')
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[42], line 4
1 x = -1
3 if x < 0:
----> 4 raise Exception('Sorry, no numbers below zero')
Exception: Sorry, no numbers below zero
[43]:
x = -3
assert x >= 0, 'Sorry, no numbers below zero'
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[43], line 3
1 x = -3
----> 3 assert x >= 0, 'Sorry, no numbers below zero'
AssertionError: Sorry, no numbers below zero
Here is one more simple example, where assert
is used to ensure that no empty lists are passed to marks
.
[44]:
def average(marks):
assert len(marks) != 0, 'List is empty'
return sum(marks)/len(marks)
[45]:
marks1 = [55, 88, 78, 90, 79]
print('Average of marks is:', average(marks1))
Average of marks is: 78.0
[46]:
marks2 = []
print('Average of marks is:', average(marks2))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[46], line 2
1 marks2 = []
----> 2 print('Average of marks is:', average(marks2))
Cell In[44], line 2, in average(marks)
1 def average(marks):
----> 2 assert len(marks) != 0, 'List is empty'
3 return sum(marks)/len(marks)
AssertionError: List is empty
Assert
is typically used to verify program conditions during development (such as user-defined constraints), rather than for catching genuine programming errors. Because Python catches programming errors itself, there is usually no need to use assert
to catch things like zero divides, out-of-bounds indexes, type mismatches, etc. In general, assertions are useful for checking types, classes, or values of inputted variables, checking data structures such as duplicates in a list or
contradictory variables, and checking that outputs of functions are reasonable and as expected.
6.2 Module Coding Basics¶
Every Python file with code is referred to as a module. To create modules, we don’t need to write special syntax to tell Python that we are making a module. We can simply use any text editor to type Python code into a text file, and save it with a .py
extension; any such file is automatically considered a Python module.
For example, I have created a simple file called my_module.py
that is saved in the same directory as this Jupyter notebook. The module does not do anything useful, it just defines a few names and prints a few statements. The code inside my_module.py
is shown below.
Similar to the rules for naming other variables in Python, module names should follow the same rules and can contain only letters, digits, and underscores. The module names cannot use Python-reserved keywords (e.g., such as a module file named if.py
.)
The import
Statement¶
Python programs can use the modules file we have created by running an import
or from
statement. These statements find, compile, and run a module file’s code. The main difference is that import
fetches the module as a whole, while from
fetches specific names out of the module.
Let’s import my_module
. Python executes the statements in the module file one after another, from the top of the file to the bottom. For this module, the two print statements at the top level of the file are executed. The print statements inside the two functions (main_report
and sub_report
) are not executed; they will be executed only when the functions sub_report
and main_report
are called.
[47]:
import my_module
I am inside my_module
The value of the variable X is: 3
Note that we don’t use the .py
extension for the files with the import
statement (i.e., import my_module.py
will raise an exception).
When the module is imported, a new module object is created. The module object is shown below, where Python mapped the module name to an external filename by adding a directory path from the module search path to the file, and a .py
extension at the end.
[48]:
# The name my_module references to the loaded module object
my_module
[48]:
<module 'my_module' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\my_module.py'>
Overall, the name my_module
serves two different purposes: 1. It identifies the external file my_module.py
that needs to be loaded. 2. After the module is loaded, it becomes a reference to the module object.
During importing, all the names assigned at the top level of the module become attributes of the module object. In this example, the variables X
and Y
and the functions sub-report
and main_report
become attributes of the module, and we can call them by using the object.attribute
syntax (a.k.a. qualification).
[49]:
my_module.X
[49]:
3
[50]:
my_module.Y
[50]:
5
[51]:
my_module.sub_report()
The value of the variable Z is: 8
I am a function named sub_report
The from
Statement¶
The from
statement fetches specific names from the module, and allows to use the names directly (without the need for module_object.attribute
). This way, we can call the names in the module with less typing.
[52]:
from my_module import X
X
[52]:
3
The from
statement in effect copies the names out of the module into another scope; in this case, in the scope of this Jupyter notebook, where the from
statement appears.
When we run a from
statement, internally Python first imports the entire module file as usual, then copies the specific names out of the module file, and finally, it deletes the module file. This is similar to the following code:
import my_module
X = module.X
del my_module
With from
, we can also import several names at the same time, separated by commas.
[53]:
from my_module import X, Y, sub_report
[54]:
sub_report()
The value of the variable Z is: 8
I am a function named sub_report
Another alternative is to use a *
instead of specific names, which fetches all names assigned at the top level of the referenced module. The following code fetches all four names in our module: X
, Y
, sub_report
, and main_report
. Note again that the names Z
and U
are not defined at the top level in the module, but are enclosed in the functions, and therefore, they can not be fetched with the import
statement.
[55]:
from my_module import *
main_report()
The value of the variable U is: 10
I am a function named main_report
[56]:
U
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[56], line 1
----> 1 U
NameError: name 'U' is not defined
One problem with using from module import *
is that it can silently overwrite variables that have the same name as existing variables in our scope.
In the following example, we have a variable X = 15
, which was overwritten by the variable X
with the same name in my_module
which has the value 3. The way this variable was overwritten may not be obvious (e.g., in large modules with many variables we cannot remember and keep track of all variable names).
[57]:
X = 15
from my_module import *
print(X)
3
On the other hand, if we use import
, all names will be defined only within the scope of the module, and the names will not collide with other names in our programs.
[58]:
X = 15
import my_module
# The print statements this time were not displayed (the reason why it so is explained in the Appendix)
[59]:
print(X)
print(my_module.X)
15
3
Therefore, programmers need to be careful when using the from
statement (especially with *
), and the import
only statement should be preferred. However, from
also provides convenience of less typing, and it is still very commonly used.
When Using import
is Required¶
When the same name of a variable or function is defined in two different modules, and we need to use both of the names at the same time, then we must use the import
statement.
For instance, let’s assume that another module file named module_no_2.py
also contains a variable X
and a function main_report
.
Using import
we can load the two different variables X
, because including the name of the enclosing module makes the two names unique.
[60]:
import my_module # when a module is imported the first time, it is executed
import module_no_2 # when a module is imported afterward, it is not executed
I am inside module_no_2
The value of the variable X is: 22
[61]:
print(my_module.X)
print(module_no_2.X)
3
22
The same holds for the function main_report
which appears in both modules.
[62]:
my_module.main_report()
module_no_2.main_report()
The value of the variable U is: 10
I am a function named main_report
The value of the variable Y is: 15
I am a function named main_report
In this case, the from
statement will fail because we can have only one assignment to the name X
in the scope.
[63]:
# Only one variable name X can exist at one time
from my_module import X
from module_no_2 import X
print(X)
22
Another way to resolve the name clashing problem is to use the as
extension to from/import
that allows to import a name under another name that will be used as a synonym.
[64]:
from my_module import X as X1
from module_no_2 import X as X2
print(X1)
print(X2)
3
22
Module Namespaces¶
Modules can be understood as places where collections of names are defined that we want to make visible to the rest of our code. These collections of names live in the module’s namespace and represent the attributes of the module object.
To access the namespace of my_module
object, we can use the built-in dir
method. We can notice the names we assigned to the module file: X
, Y
, main_report
, and sub_report
. However, Python also adds some names in the module’s namespace for us; for instance, __file__
gives the path to the file the module was loaded from, and __name__
gives the module name.
[65]:
dir(my_module)
[65]:
['X',
'Y',
'__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'main_report',
'sub_report']
Internally, the module namespaces created by imports are stored as dictionary objects. Module namespaces can also be accessed through the built-in __dict__
attribute associated with module objects, where the names are dictionary keys.
[66]:
my_module.__dict__.keys()
[66]:
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__file__', '__cached__', '__builtins__', 'X', 'Y', 'sub_report', 'main_report'])
[67]:
my_module.__dict__['__file__']
[67]:
'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\my_module.py'
[68]:
my_module.__dict__['__name__']
[68]:
'my_module'
6.3 Module Packages¶
When we create programs in Python, it is helpful to organize the individual module files related to an application into sub-directories. A directory of Python code is said to be a package or modules package. Importing a directory is known as a package import.
For example, consider the directory MyMainPackage
which is located in the same directory as this Jupyter notebook.
MyMainPackage
├── __init__.py
├── main_script
├── MySubPackage
│ ├── __init__.py
│ ├── sub_script.py
To import the module file sub_script.py
which is located inside the directory MySubPackage
, we can use the dotted syntax shown in the following cell MyMainPackage.MySubPackage.sub_script
. In effect, this turns the directory MyMainPackage
into a Python namespace, which has attributes corresponding to the sub-directories and module files that the directory contains.
[69]:
import MyMainPackage.MySubPackage.sub_script
I am inside sub_script, which is located in MySubPackage
The value of the variable X is: 23
As we learned previously, import
fetches a module as a whole, and the names (variables and functions) that are defined in the module sub_script.py
become attributes of the imported object. These include the variable X
and the function sub_report
.
[70]:
MyMainPackage.MySubPackage.sub_script.X
[70]:
23
[71]:
MyMainPackage.MySubPackage.sub_script.sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
The dotted path in the cell corresponds to the path through the directory hierarchy that leads to the module file sub_script.py
, i.e., MyMainPackage\MySubPackage\sub_script.py
.
On the other hand, note that syntax with backward slashes does not work with the import
statement.
[72]:
import MyMainPackage\MySubPackage\sub_script
Cell In[72], line 1
import MyMainPackage\MySubPackage\sub_script
^
SyntaxError: unexpected character after line continuation character
Similarly to import
and from
statements with modules, to fetch specific names from the sub_script.py
module, we can use the from
statement with packages as well.
[73]:
from MyMainPackage.MySubPackage.sub_script import X
[74]:
X
[74]:
23
[75]:
from MyMainPackage.MySubPackage.sub_script import sub_report
[76]:
sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
Package __init__.py
Files¶
When using package imports, there is one more constraint that we need to follow: each directory named within the path of a package import statement must contain a file named __init__.py
. Otherwise, the package import will fail.
In the example we have been using, note that both MyMainPackage
and MySubPacakge
directories contain a file called __init__.py
. The __init__.py
names are special, as they declare that a directory is a Python package.
The __init__.py
files are very often completely empty, and don’t contain any code. But, they can also contain Python code, just like other module files. In our MyMainPackage
example, the __init__.py
files are empty.
The __init__.py
files are run automatically the first time a Python program imports a directory. Because of that, __init__.py
files can be used to store code to initialize the state required by files in a package (e.g., to create required data files, open connections to databases, and so on).
On a separate note, don’t confuse __init__.py
files in module packages with the __init__()
class constructor method that we used before for specifying attributes of class instances. Both have initialization roles, but they are otherwise very different.
Difference Between from
and import
with Packages¶
The import
statement can be somewhat inconvenient to use with packages, because we may have to retype the paths to the files and sub-directories frequently in our program. In our example, we must retype and rerun the full path from MyMainPackage
each time we want to reach the names in the sub_script.py
file. Otherwise, we will get an error.
[77]:
sub_script.X
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[77], line 1
----> 1 sub_script.X
NameError: name 'sub_script' is not defined
[78]:
MySubPackage.sub_script.X
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[78], line 1
----> 1 MySubPackage.sub_script.X
NameError: name 'MySubPackage' is not defined
[79]:
MyMainPackage.MySubPackage.sub_script.X
[79]:
23
[80]:
# Use X in our code
print(MyMainPackage.MySubPackage.sub_script.X + 27)
print(MyMainPackage.MySubPackage.sub_script.X % 2)
print((MyMainPackage.MySubPackage.sub_script.X -13)/2)
50
1
5.0
It is often more convenient to use the from
statement with packages to avoid retyping the paths at each access.
[81]:
from MyMainPackage.MySubPackage.sub_script import X
X
[81]:
23
[82]:
print(X + 27)
print(X % 2)
print((X - 13)/2)
50
1
5.0
In addition, if we ever restructure or rename the directory tree, the from
statement requires just one path update in the code, whereas the import
statement may require updates in many lines in the code.
However, import
can be advantageous if there are two modules with the same name that are located in different directories, and are used in the same program. With the from
statement, we can reach only one of the two modules at a time.
For example, in our MyMainPackage
, there is a function sub_report
in both the main_script
and sub_script
. If we use from
statement, the name sub_report
will change depending on whether it is imported from the main_script
or the sub_script
.
[83]:
from MyMainPackage.MySubPackage.sub_script import sub_report
[84]:
sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
[85]:
from MyMainPackage.main_script import sub_report
I am inside main_script, which is located in MyMainPackage
The value of the variable X is: 12
[86]:
# Name collision with the sub_report name used in the cell above
sub_report()
I am a function inside main_script
The value of the variable Z is: 6
But, with the import
statement, we can use either of the two functions sub_report
, because their names will involve their full path, and this way, the names will not clash. The only inconvenience is that we need to type the full paths to the two functions.
[87]:
import MyMainPackage.MySubPackage.sub_script
MyMainPackage.MySubPackage.sub_script.sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
[88]:
import MyMainPackage.main_script
MyMainPackage.main_script.sub_report()
I am a function inside main_script
The value of the variable Z is: 6
Another alternative is to use the as
extension, which will create unique synonyms for the names of the two functions. As we mentioned before, this extension is commonly used to provide short synonyms for longer names, and to avoid name clashes when we are already using a name in a script that would otherwise be overwritten by a regular import
statement.
[89]:
from MyMainPackage.MySubPackage.sub_script import sub_report as sub_sub_report
sub_sub_report()
I am a function inside sub_script
The value of the variable Y is: 5
[90]:
from MyMainPackage.main_script import sub_report as main_sub_report
main_sub_report()
I am a function inside main_script
The value of the variable Z is: 6
Appendix: Modules and Packages Extras¶
Modules Usage Modes: __name__
and __main__
¶
We mentioned that each module has a built-in attribute called __name__
, which Python assigns automatically to all module objects. The attribute is assigned as follows: - If the file is being imported by using the import
statement, __name__
is set to the module’s name. - If the file is being run as a top-level program file, __name__
is set to the string __main__
.
Let’s check it with an example. The module file module_no_3
is shown below, and note that in the first line we will print the assigned attribute __name__
to confirm that the above is correct.
As expected, __name__
is assigned to module_no_3
when imported.
[91]:
# The module is imported
import module_no_3
Print the built-in attribute name of the module: module_no_3
The value of the variable X is: 1
When module_no_3
is run directly, __name__
is set to __main__
.
[92]:
# The module is run by passing it as a command to the Python interpreter
!python module_no_3.py
Print the built-in attribute name of the module: __main__
The value of the variable X is: 1
Thus, the __name__
attribute can be used in the following if
test if __name__ == '__main__'
to determine whether it is being run or imported.
Therefore, if the module is the main script in a package and represents an entry point to a package that is run by the end-users (!python mainscript.py
), the code after if __name__ == '__main__'
in the main script will be executed when the main script is run. On the other hand, all other modules in the package will be imported. Any code under the if __name__ == '__main__'
test in other modules will not be executed.
Another reason why using this is helpful is during code development for self-testing code that is written at the bottom of a file under the __name__
test. For instance, the file module_no_3a
is similar to the file module_no_3
, only that it includes several lines of code at the bottom, which test whether the function CelsiusToFahrenheit
outputs expected values. When run as a command in the cell, the if __name__ == '__main__'
is True, and the lines that test the outputs of the
CelsiusToFahrenheit
are run. Conversely, when the module file is imported, the various variables and functions are imported, but the if __name__ == '__main__'
is False, and the lines that test the outputs of the CelsiusToFahrenheit
are not run.
[93]:
!python module_no_3a.py
Print the built-in attribute name of the module: __main__
The value of the variable X is: 1
--------------------
Self-testing
100 degrees Fahrenheit is 37.77777777777778 degrees Celsius
32 degrees Fahrenheit is 0.0 degrees Celsius
0 degrees Fahrenheit is -17.77777777777778 degrees Celsius
[94]:
import module_no_3a
Print the built-in attribute name of the module: module_no_3a
The value of the variable X is: 1
The above code allows to test the logic in our code without having to retype everything at the notebook cell or at the interactive command line each time we edit the file. Besides, the output of the self-test call will not appear every time this file is imported from another file.
Functions defined in files with the __name__
test can be run as standalone functions, and they can also be reused in other programs.
Reloading Modules¶
As we have seen, when we import
a module, the code is executed only once when the module is imported the first time. Subsequent imports use the already loaded module object without reloading or rerunning the file’s code.
To force a module’s code to be reloaded and rerun, you need to instruct Python to do so explicitly by calling the reload
built-in function. The reload
reruns a module file’s code and overwrites its existing namespace, rather than deleting the module object and re-creating it. Also, the reload
function returns the module object at the output of the cell.
[95]:
import my_module
[96]:
from imp import reload
reload(my_module)
I am inside my_module
The value of the variable X is: 3
C:\Users\vakanski\AppData\Local\Temp\ipykernel_18208\2249891335.py:1: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
from imp import reload
[96]:
<module 'my_module' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\my_module.py'>
Reloading can help to examine a file, for instance, when we make changes to the file. In this case, since we use Jupyter notebooks, to import
a file again after we have made some changes to the file we can just restart the kernel, which will allow us to import the file, without using reload
.
Module Packages Reloading¶
Just like module files, an already imported directory needs to be passed to reload
to force re-execution of the code. As shown, reload
accepts a dotted path name to reload nested directories and files. Also, reload
returns the module object in the displayed output of the cell.
[97]:
# Repeated import statements do not produce any output
import MyMainPackage.MySubPackage.sub_script
[98]:
from imp import reload
reload(MyMainPackage.MySubPackage.sub_script)
I am inside sub_script, which is located in MySubPackage
The value of the variable X is: 23
[98]:
<module 'MyMainPackage.MySubPackage.sub_script' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\MyMainPackage\\MySubPackage\\sub_script.py'>
Once imported, sub-script
becomes a module object nested in the object MySubPackage
, which in turn is nested in the object MyMainPackage
.
Similarly, MySubPackage
is a module object that is nested in the object MyMainPackage
.
[99]:
MyMainPackage.MySubPackage
[99]:
<module 'MyMainPackage.MySubPackage' from 'C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules\\MyMainPackage\\MySubPackage\\__init__.py'>
Python Path¶
If the directory MyMainPackage
is not in the current working directory, then it may need to be added to the Python search path. To do that, either add the full path to the directory to the PYTHONPATH variable (by setting the Environment Variables on Windows systems), or the path to the directory can be added to a .pth
file. Note that if the package is a standard library directory of a built-in function (e.g., random
, time
, sys
, os
), or if it is located in the
site-packages directory (where third-party libraries are installed), it will be automatically found by Python, and it does not need to be added to the Python search path.
Alternatively, the path to the directory can be manually added using sys.path
(that is, the path
attribute of the standard library module sys
). For instance, I can examine the sys.path
on my computer, as shown in the following cell. Since the sys.path
is just a list of directories, we can manually add the path of the current working directory, by using the append
to list method.
[100]:
import sys
sys.path
[100]:
['C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules',
'C:\\Users\\vakanski\\anaconda3\\python311.zip',
'C:\\Users\\vakanski\\anaconda3\\DLLs',
'C:\\Users\\vakanski\\anaconda3\\Lib',
'C:\\Users\\vakanski\\anaconda3',
'',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32\\lib',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\Pythonwin']
[101]:
sys.path.append('C:\\Users\\Alex\\Desktop\\python\\Lecture 6 Module Packages')
[102]:
sys.path
# The appended path is listed last
[102]:
['C:\\Users\\vakanski\\Documents\\Codes\\2023 Codes\\Python for Data Science Course\\Lectures_2023\\Theme_1-Python_Programming\\Lecture_6-Exceptions,_Modules\\Posted\\Lecture_6-Exceptions,_Modules',
'C:\\Users\\vakanski\\anaconda3\\python311.zip',
'C:\\Users\\vakanski\\anaconda3\\DLLs',
'C:\\Users\\vakanski\\anaconda3\\Lib',
'C:\\Users\\vakanski\\anaconda3',
'',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\win32\\lib',
'C:\\Users\\vakanski\\anaconda3\\Lib\\site-packages\\Pythonwin',
'C:\\Users\\Alex\\Desktop\\python\\Lecture 6 Module Packages']
Notice now that the directory MyMainPackage
is now listed in the sys.path
. However, this modified sys.path
is temporary and it is valid only for the duration of the current session; the path is refreshed every time Jupyter Notebook is restarted, or the notebook kernel is shut down. On the other hand, the path configuration in PYTHONPATH
is permanent, and it lives after the current session is terminated.
Package Relative Imports¶
To illustrate package relative imports in Python we will use the MyRelativeImportPackage
which is similar to the MyMainPackage
and contains several simple files.
MyRelativeImportPackage
├── __init__.py
├── relative_import_script_1
├── relative_import_script_2
├── relative_import_script_5
├── relative_import_script_6
├── script_1
├── script_2
├── script_3
├── script_4
├── MySubPackage
│ ├── __init__.py
│ ├── relative_import_script_3
│ ├── relative_import_script_4
│ ├── sub_script
When modules within a package need to import other names from other modules in the same package, it is still possible to use the full path syntax for importing, as we did in the above section. This is called an absolute import.
For instance, the relative_import_script_1.py
in the first line imports script_1
by using the full name of the directory (i.e., from MyRelativeImportPackage import script_1
).
[103]:
import MyRelativeImportPackage.relative_import_script_1
I am inside script_1, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_1, which is located in MyMainPackage
However, package files can also make use of a special syntax to simplify import statements within the same package. Instead of directly using the full path to the directory, Python allows to use a leading dot .
to refer to the current directory in the package.
Therefore, instead of using from MyRelativeImportPackage import script_1
, we can use from . import script_1
. This is implemented in the relative_import_script_2.py
to import script_2
.
This syntax is referred to as a relative import because the path to the module to be imported is related to the current directory in which the module that imports is located.
The convenience of using relative imports is that we don’t need to write the name or the path of the current directory.
[104]:
import MyRelativeImportPackage.relative_import_script_2
I am inside script_2, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_2, which is located in MyMainPackage
One more example is presented in the next cell, where the module relative_import_script_3.py
is located in the directory MySubPackage
and it imports the module sub_script
which is located in the same directory by using the .
syntax.
[105]:
import MyRelativeImportPackage.MySubPackage.relative_import_script_3
I am inside sub_script, which is located in MySubPackage
--------------------
I am inside the relative_import_scipt_3, which is located in MySubPackage
If we use two dots syntax as in ..
, then a module can import another module that is located in its parent directory of the current package (i.e., the directory above). For example, the relative_import_script_4.py
is located in the MySubPackage
directory, and it uses from .. import script_3
to import the script_3
module that is located in the parent directory of MySubPackage
, that is, MyMainPacakage
.
[106]:
import MyRelativeImportPackage.MySubPackage.relative_import_script_4
I am inside script_3, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_4, which is located in MySubPackage
On the other hand, if we tried to use only import script_3
instead of from . import script_3
, this will fail. We must use the from
dotted syntax to import modules located in the same package. This is illustrated in the example in the following cell.
[107]:
import MyRelativeImportPackage.relative_import_script_5
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[107], line 1
----> 1 import MyRelativeImportPackage.relative_import_script_5
File ~\Documents\Codes\2023 Codes\Python for Data Science Course\Lectures_2023\Theme_1-Python_Programming\Lecture_6-Exceptions,_Modules\Posted\Lecture_6-Exceptions,_Modules\MyRelativeImportPackage\relative_import_script_5.py:1
----> 1 import script_3
3 print(20*'-')
4 print('I am inside the relative_import_scipt_5, which is located in MyMainPackage')
ModuleNotFoundError: No module named 'script_3'
Another way to use the relative imports is shown in the relative_import_script_6.py
module, where the syntax from .script_4 import X
is used to import the name X
from the script_4
module which is located in the same directory as the importer module. This way, we can import specific names from modules in the same package.
[108]:
import MyRelativeImportPackage.relative_import_script_6
I am inside script_4, which is located in MyMainPackage
--------------------
I am inside the relative_import_scipt_6, which is located in MyMainPackage
The value of the variable X is 5
Absolute imports are often preferred because they are straightforward, and it is easy to tell exactly where the imported module or name is located, just by looking at the statement. But, they require more typing and writing full names and paths in the code.
One clear advantage of relative imports is that they are quite succinct, and they can turn a very long import statement into a simple and short statement. Relative imports can be messy, particularly for projects where the organization of the directories is likely to change. Relative imports are also not as readable as absolute ones, and it is not easy to tell the location of the imported names.
References¶
Mark Lutz, “Learning Python,” 5-th edition, O-Reilly, 2013. ISBN: 978-1-449-35573-9.
Pierian Data Inc., “Complete Python 3 Bootcamp,” codes available at: https://github.com/Pierian-Data/Complete-Python-3-Bootcamp.
BACK TO TOP