Working with text#
There is a point in time where you have to to write your Hello World application. This is considered to be the first thing you do when learning a new programming language. Well, we have done other things first. However, it’s never to late, is it?
And it brings another data type into play. The string type str
, and how to deal with it. So let’s create our first string now.
message = "Hello World"
We can output the contents of our variable as usual in our jupyter environment, by just executing a cell with the variable name in it.
message
'Hello World'
However usually something more explicit is necessary to output text. Python offers the print
function for this. Another builtin function right at our hands.
print(message)
Hello World
This does the job. In any Python environment. It will output the provided message followed by a newline, which allows to start outputing text at the beginning of the next line with a possible follow up call to print.
Tip
To add a newline to the text to be printed is the default behaviour when using print this way. But print is much more powerful. Use the help builtin function with print as its argument to learn more on what print has to offer.
Strings can be understood as arrays of characters. Actually in programming languages such as C, it’s actually just that, nothing more. In Python it’s an object of course, as anything is an object in Python, and has some more to offer. Let’s have a look using tge dir
function.
dir(message)
['__add__',
'__class__',
'__contains__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__getnewargs__',
'__getstate__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__mod__',
'__mul__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rmod__',
'__rmul__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'capitalize',
'casefold',
'center',
'count',
'encode',
'endswith',
'expandtabs',
'find',
'format',
'format_map',
'index',
'isalnum',
'isalpha',
'isascii',
'isdecimal',
'isdigit',
'isidentifier',
'islower',
'isnumeric',
'isprintable',
'isspace',
'istitle',
'isupper',
'join',
'ljust',
'lower',
'lstrip',
'maketrans',
'partition',
'removeprefix',
'removesuffix',
'replace',
'rfind',
'rindex',
'rjust',
'rpartition',
'rsplit',
'rstrip',
'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'zfill']
Quite interesting. First of all there are some attributes looking a bit strange compared to the bottom ones. They start and end with a double underscore, such as __len__
. These are called dunder methods (for double under), and usually implement a certain protocol that can be used on the object, such as iteration, or implement operators that are considered to be useful in the context. Addition and mmultiplication are implemented using this concept. The __add__
and __mul__
methods are used for this for example. Strings also offet a __len__
method. This let’s us call the len
builtin method on our message to see how many characters are in our message. Let’s have a look.
len(message)
11
Okay. So our message consists of 11 characters. Seems correct I think. Interested in some more useful functions? Here we go.
upper
can return a version of our message, that is all uppercase
message.upper()
'HELLO WORLD'
capitalize
would make the first letter of our string upper case, whilst all others become lower case.
message.capitalize()
'Hello world'
swapcase
will toggle upper and lower case letters.
message.swapcase()
'hELLO wORLD'
replace
will exchange a part of our message with another one. If we wanted to replace the word World with You we would use the following.
message.replace("World", "You")
'Hello You'
Tip
To learn more about the provided methods by the string type, again use the help builtin on these methods. Such as help(message.title) to learn more on the title method.
Looking more closely on the dir output we recognize that addition (__add__
) and multiplication (__mul__
) do actually exist for strings. How does that even make sense? Sure you can do all these things with numbers, but with strings? Let’s just try it!
message + "!"
'Hello World!'
Okay, so addition of strings simply extends our message. We were able to add another string ! to our message. That’s quite nice. But what about multiplication?
message * "!"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[11], line 1
----> 1 message * "!"
TypeError: can't multiply sequence by non-int of type 'str'
So multiplying with another text is not allowed. Python throws an error here. But what about using numbers?
message * 4
'Hello WorldHello WorldHello WorldHello World'
This makes sense. We can create a new text which uses our message 4 times to make a new string.
Format strings#
And theres a special syntax for strings as well. If the sting is preceeded with the letter f, Python will parse the text and look for identifiers, that should be replaced with variable content. For example
f"{message}!"
'Hello World!'
will extend our message with an exclamation mark by replacing the identifier in curly braces with the contents of the variable message
.
This will basically work with all printable variables. So variables with numbers can be used as well.
a = 10.
b = 12.
f"The value of a is {a}, b is {b} currently."
'The value of a is 10.0, b is 12.0 currently.'
Even small Python statements can be used in these identifiers.
f"The product of a with value {a} and b with value {b} is {a*b}."
'The product of a with value 10.0 and b with value 12.0 is 120.0.'
Also function calls are allowed.
import math
f"The value of pi/2 is {math.pi/2}. The sine value of pi is {math.sin(math.pi/2)}"
'The value of pi/2 is 1.5707963267948966. The sine value of pi is 1.0'
The string can even be formatted to have a more focused output.
f"The value of pi/2 is roughly {math.pi/2:.2f}."
'The value of pi/2 is roughly 1.57.'
This will limit the output in floating point notations to two digits after the decimal point.
Here are some more examples. I think you get the idea!
value = 120.12
f"{value:7.0f}"
' 120'
f"{value:7.2f}"
' 120.12'
f"{value:0.2f}"
'120.12'
f"{value:07.2f}"
'0120.12'
Note
Please note that when using floating point precision format strings the digit counts as an output character.
A relatively new syntax for format strings is the following.
f"{a=}, {b=}"
'a=10.0, b=12.0'
This is very a very useful shortcut for debugging purposes. If you have doubts in the correctness of an algorithm you’re currently working on add a simple print statement using this format string for the variables you’re using at the point of interest, and you might be quickly enlightened.