UFCFFY-15-M Cyber Security Analytics

Assignment: Task 1


The completion of Portfolio Task 1: Conduct an investigation on a web application to identify malicious attack activity using Python data science libraries is worth 20% towards your portfolio for the UFCFFY-15-M Cyber Security Analytics (CSA) module. Please refer to your Assignment Overview for full details.

Portfolio Task 1: Conduct an investigation on a web application to identify malicious attack activity using Python data science libraries (20%)


For this task, you will be provided with a personalised dataset that you are expected to analyse. You should aim to identify any suspicious activities that have occurred in the dataset, based on your knowledge and understanding of web application security. You will need to ensure that your submission is made based on the information in your assigned dataset - failure to use the dataset assigned to your username will result in a zero grade. Your portfolio submission for this task should be an HTML export of your IPYNB Jupyter notebook that details your investigation using appropriate code cells to perform the required analysis and Markdown cells to explain your work.

As a cyber security analyst, you have been provided with a set of logs related to your organisation web server. You will need to analyse these logs and seek out suspicious activity based on the data available.

More information about Microsoft Internet Information Services (IIS) can be found at the following URL: https://docs.microsoft.com/en-us/previous-versions/iis/6.0-sdk/ms525410(v=vs.90)

Dataset: Please see the folder "Portfolio Assignment" under the Assignment tab on Blackboard for further detail related to the access and download of the necessary dataset.

Hint: The TryHackMe room "HTTP in detail" may help your research for what to investigate within this large dataset.

Assessment and Marking


Criteria 0-39 40-49 50-59 60-69 70-84 85-100
Identification of the suspicious activity (25%) No evidence of progress A limited attempt to address this criteria Some correct detail is identified but perhaps not all Most correct detail is identified All correct detail identified with good justification All correct detail identified with excellent justification
Analytical reasoning and process (25%) No evidence of progress A limited attempt to address this criteria Some evidence of analysis but perhaps some flaws in the approach Evidence of analysis with only some minor flaws Very good analytical approach Excellent analytical approach
Python and Pandas proficiency (25%) No evidence of progress A limited attempt to address this criteria Some fair usage but perhaps not optimal Good usage of Python and Pandas with only minor flaws Very good usage of in-built functions Excellent professional usage of in-built functions
Clarity and professional report presentation (25%) No evidence of progress A limited attempt to address this criteria Some evidence of markdown commentary but with major flaws Markdown commentary with only minor flaws Very good detail in markdown commentary Excellent detail in markdown commentary

To achieve the higher end of the grade scale, you need to demonstrate how you have conducted your investigating, identifying the malicious activity, and demonstrate a good command of Pandas data analysis to conduct your investigation.

Submission Documents


Your submission for this task should include:

  • 1 Jupyter Notebook exported in HTML format. You should complete your work using the iPYNB file provided (i.e., this document). You should also complete the self-assessment section (below). Once you have completed your work, you should use the export function in Jupyter to save your notebook as an HTML document. Do not submit a ipynb file - we will not execute any code during marking. Therefore, you must ensure that all cell output is clear in your HTML document for your marker.

Your final portfolio should be submitted to Blackboard by 14:00 on 12th May 2022. Your Blackboard submission should consist of the following individual files:

  • Task1.html (an HTML document exported from Jupyter notebook for Task 1)
  • Task1.ipynb (source Jupyter notebook for Task 1)
  • Task2.html (an HTML document exported from Jupyter notebook for Task 2)
  • Task2.ipynb (source Jupyter notebook for Task 2)
  • Task3.pdf (a PDF report of your research investigation for Task 3)
  • Task4.mp4 (an MP4 video file, or similar standard format - or a URL to an online video - for Task 4)

Please do not ZIP the files together as a single submission on Blackboard, you can submit multiple files to Blackboard.

Self-Assessment


For each criteria, please reflect on the marking rubric and indicate what grade you would expect to receive for the work that you are submitting. For your own personal development and learning, it is important to reflect on your work and to attempt to assess this careful. Do think carefully about both positive aspects of your work, as well as any limitations you may have faced.

  • Identification of the suspicious activity (25%): You estimate that your grade will be __.

  • Analytical reasoning and rationale (25%): You estimate that your grade will be __.

  • Python and Pandas proficiency (25%): You estimate that your grade will be __.

  • Clarity and professional report presentation (25%): You estimate that your grade will be __.

Please provide a minimum of two sentences to comment and reflect on your own self-assessment: __. __.

Contact


Questions about this assignment should be directed to your module leader (Phil.Legg@uwe.ac.uk). You should use the online Q&A form to ask questions related to this module and this assignment, as well as utilising the on-site teaching sessions.


In [1]:
# Import libraries as required
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 10)

In the cell below, you will need to change data_file to your own specific data filename. The example data file is purely to demonstrate some initial steps for your investigation and should not be used.

In [2]:
data_file = 'example_data'
In [3]:
# Load in the data set as required - 
data_path = './example_data/'
data = pd.read_csv(data_path + data_file, delim_whitespace=True)
#data.to_csv('out.csv')
temp_df = data[data.columns[:-1]]
temp_df.columns = data.columns[1:]
data = temp_df
data
Out[3]:
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
0 2022-01-01 00:40:00 57.214.107.110 GET yxptqave.js v=566475 443 - 141.144.38.173 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.... - 200 0 0 27
1 2022-01-01 00:40:00 57.214.107.110 GET iooyxvph.js v=109316 443 - 141.144.38.173 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.... - 200 0 0 24
2 2022-01-01 00:40:00 57.214.107.110 GET nrsxibon.css - 443 - 141.144.38.173 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.... - 200 0 0 22
3 2022-01-01 00:40:00 57.214.107.110 GET index.aspx - 443 - 141.144.38.173 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.... - 200 0 0 26
4 2022-01-01 00:40:18 57.214.107.110 GET fhlxfybr.css - 443 - 141.144.38.173 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.... https://bankofpunk.local/index.aspx 200 0 0 27
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
70562 2022-01-30 22:17:49 57.214.107.110 GET wvdcqgqr.css - 443 mj167050 93.189.110.180 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+Appl... https://bankofpunk.local/transactions.aspx 200 0 0 20
70563 2022-01-30 22:17:49 57.214.107.110 GET transactions.aspx page=1 443 mj167050 93.189.110.180 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+Appl... https://bankofpunk.local/transactions.aspx 200 0 0 25
70564 2022-01-30 22:18:00 57.214.107.110 GET favico.ico - 443 mj167050 93.189.110.180 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+Appl... https://bankofpunk.local/transactions.aspx 200 0 0 25
70565 2022-01-30 22:18:00 57.214.107.110 GET main.css - 443 mj167050 93.189.110.180 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+Appl... https://bankofpunk.local/transactions.aspx 200 0 0 28
70566 2022-01-30 22:18:00 57.214.107.110 GET transactions.aspx page=2 443 mj167050 93.189.110.180 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+Appl... https://bankofpunk.local/transactions.aspx 200 0 0 28

70567 rows × 15 columns

In [4]:
# Get all column names
data.columns
Out[4]:
Index(['date', 'time', 's-ip', 'cs-method', 'cs-uri-stem', 'cs-uri-query',
       's-port', 'cs-username', 'c-ip', 'cs(User-Agent)', 'cs(Referer)',
       'sc-status', 'sc-substatus', 'sc-win32-status', 'time-taken'],
      dtype='object')
In [5]:
# Search for all unique entries in 'cs(Referer)'
data['cs(Referer)'].unique()
Out[5]:
array(['-', 'https://bankofpunk.local/index.aspx',
       'https://bankofpunk.local/login.aspx',
       'https://bankofpunk.local/account_status.aspx',
       'https://bankofpunk.local/transactions.aspx',
       'https://bankofpunk.local/transfer.aspx',
       'https://bankofpunk.local/transfer_complete.aspx',
       'https://bankofpunk.local/change_avatar.aspx',
       'https://bankofpunk.local/faq.aspx',
       'https://bankofpunk.local/changepassword.aspx'], dtype=object)
In [6]:
# Get count of each unique value for 'cs(Referer)'
data['cs(Referer)'].value_counts()
Out[6]:
https://bankofpunk.local/transactions.aspx         23586
https://bankofpunk.local/index.aspx                17911
https://bankofpunk.local/login.aspx                13545
https://bankofpunk.local/account_status.aspx        6238
-                                                   6039
https://bankofpunk.local/transfer.aspx              1800
https://bankofpunk.local/changepassword.aspx         629
https://bankofpunk.local/faq.aspx                    383
https://bankofpunk.local/transfer_complete.aspx      319
https://bankofpunk.local/change_avatar.aspx          117
Name: cs(Referer), dtype: int64
In [7]:
# Plot the first 100 values for the 'time-taken' column
plt.figure(figsize=(20,5))
plt.plot(data['time-taken'][:100])
plt.show()

Start your investigation...

Carry on with the investigation based on the initial code provided above. Conclude you investigation with a summary of what activities you have identified and why they are deemed to be suspicious.

In [ ]: