SDAV Basics


In this worksheet, you will learn:

  • Basics of Python programming
  • Introduction to some of the data science libraries

1. Basics of Python programming

First let's look at some basics of Python and get use to manipulating data using the built in variable types.

In [4]:
# First some simple variables

an_integer = 12
a_floating_point_number = 18.4732
In [6]:
# We can do some simple maths on these variables and see the output

an_integer + a_floating_point_number
Out[6]:
30.4732
In [7]:
# We can also write functions to do simple maths

def multiply(number1, number2):
    return number1 * number2

multiply(an_integer, a_floating_point_number)
Out[7]:
221.67839999999998
In [9]:
# We can also create text variables just as easily

a_string = 'Hello there!'
print (a_string)
Hello there!
In [20]:
# We can do some simple manipulation of text

my_name = 'Phil'
message = a_string + ' My name is ' + my_name
print (message)

# Including spliting sentences to corrupt a message
imposter_name = 'Dave'
s_m = message.split(" ")
new_message = ' '.join(s_m[:-1]) + ' ' + imposter_name
print (new_message)
Hello there! My name is Phil
Hello there! My name is Dave
In [32]:
# We also have lists of data that can sort variables
fruits = ['apple','banana','orange','lemon']
print (fruits)
# We can access sets of variables using indexes
print (fruits[0:2])
print (fruits[:-1])
# We can append items to the list, and remove items from the list
fruits.append('mango')
print (fruits)
fruits.remove('banana')
print (fruits)
['apple', 'banana', 'orange', 'lemon']
['apple', 'banana']
['apple', 'banana', 'orange']
['apple', 'banana', 'orange', 'lemon', 'mango']
['apple', 'orange', 'lemon', 'mango']
In [33]:
# We can also create dictionary objects 
# This is helpful for storing related variables about an object

person = {}
person['name'] = 'bob'
person['age'] = 23
person['height'] = 185
person['email'] = 'bob@bobmail.com'
print (person)
{'name': 'bob', 'age': 23, 'height': 185, 'email': 'bob@bobmail.com'}
In [65]:
# Like earlier, we could use a function to create 'person' objects
people = []

def create_person(name, age, height, email):
    global people
    new_person = {'name':name,
                 'age':age,
                 'height':height,
                 'email':email}
    people.append(new_person)

create_person('bob', 23, 177, 'bob@bobmail.com')
create_person('john', 41, 185, 'john@johnmail.com')
create_person('sophie', 31, 157, 'sophie@sophiemail.com')
create_person('wendy', 19, 174, 'wendy@wendymail.com')


# Here we store our person objects in our people list
# to make a group of 'persons' - a.k.a. people!
print (people)
[{'name': 'bob', 'age': 23, 'height': 177, 'email': 'bob@bobmail.com'}, {'name': 'john', 'age': 41, 'height': 185, 'email': 'john@johnmail.com'}, {'name': 'sophie', 'age': 31, 'height': 157, 'email': 'sophie@sophiemail.com'}, {'name': 'wendy', 'age': 19, 'height': 174, 'email': 'wendy@wendymail.com'}]

2. Introducing data science libraries

We have covered a lot very quickly here. You've now already used the main built in variables of Python, that allow you to store numerical and text data, and the data structures such as lists (which are essentially arrays), and dictionaries (which are essentially objects). Let's now explore this deeper by introducing some of the data science libraries.

In [73]:
# We can import libraries using the following
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [67]:
# Our people dictionary is difficult for us to read clearly
# Pandas DataFrames help manipulate tabular data like this very easily

data = pd.DataFrame(people)
data
Out[67]:
age email height name
0 23 bob@bobmail.com 177 bob
1 41 john@johnmail.com 185 john
2 31 sophie@sophiemail.com 157 sophie
3 19 wendy@wendymail.com 174 wendy
In [68]:
# We can access individual columns of the data now
data['age']
Out[68]:
0    23
1    41
2    31
3    19
Name: age, dtype: int64
In [69]:
# Who is the tallest of the users? Let's find out
data[data['height'] == np.max(data['height'])]
Out[69]:
age email height name
1 41 john@johnmail.com 185 john
In [70]:
# Who is the shortest of the users? Let's find out
data[data['height'] == np.min(data['height'])]
Out[70]:
age email height name
2 31 sophie@sophiemail.com 157 sophie
In [83]:
# What if we want to plot this data quickly?
data.plot()
plt.show()

Now what?

Spend some time researching into Pandas, Matplotlib, and Numpy - these are core to manipulating numerical and tabular data, and then being able to visualize the results. There are many great examples online of getting started with these libraries.

E.g., (Free e-book 'A Whirlwind Tour of Python': http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp)