The completion of Portfolio Task 1: Conduct an investigation on a web application to identify malicious attack activity using Python data science libraries is worth 20% towards your portfolio for the UFCFFY-15-M Cyber Security Analytics (CSA) module. Please refer to your Assignment Overview for full details.
For this task, you will be provided with a personalised dataset that you are expected to analyse. You should aim to identify any suspicious activities that have occurred in the dataset, based on your knowledge and understanding of web application security. You will need to ensure that your submission is made based on the information in your assigned dataset - failure to use the dataset assigned to your username will result in a zero grade. Your portfolio submission for this task should be an HTML export of your IPYNB Jupyter notebook that details your investigation using appropriate code cells to perform the required analysis and Markdown cells to explain your work.
As a cyber security analyst, you have been provided with a set of logs related to your organisation web server. You will need to analyse these logs and seek out suspicious activity based on the data available.
More information about Microsoft Internet Information Services (IIS) can be found at the following URL: https://docs.microsoft.com/en-us/previous-versions/iis/6.0-sdk/ms525410(v=vs.90)
Dataset: Please see the folder "Portfolio Assignment" under the Assignment tab on Blackboard for further detail related to the access and download of the necessary dataset.
Hint: The TryHackMe room "HTTP in detail" may help your research for what to investigate within this large dataset.
Criteria | 0-39 | 40-49 | 50-59 | 60-69 | 70-84 | 85-100 |
---|---|---|---|---|---|---|
Identification of the suspicious activity (25%) | No evidence of progress | A limited attempt to address this criteria | Some correct detail is identified but perhaps not all | Most correct detail is identified | All correct detail identified with good justification | All correct detail identified with excellent justification |
Analytical reasoning and process (25%) | No evidence of progress | A limited attempt to address this criteria | Some evidence of analysis but perhaps some flaws in the approach | Evidence of analysis with only some minor flaws | Very good analytical approach | Excellent analytical approach |
Python and Pandas proficiency (25%) | No evidence of progress | A limited attempt to address this criteria | Some fair usage but perhaps not optimal | Good usage of Python and Pandas with only minor flaws | Very good usage of in-built functions | Excellent professional usage of in-built functions |
Clarity and professional report presentation (25%) | No evidence of progress | A limited attempt to address this criteria | Some evidence of markdown commentary but with major flaws | Markdown commentary with only minor flaws | Very good detail in markdown commentary | Excellent detail in markdown commentary |
To achieve the higher end of the grade scale, you need to demonstrate how you have conducted your investigating, identifying the malicious activity, and demonstrate a good command of Pandas data analysis to conduct your investigation.
Your submission for this task should include:
Your final portfolio should be submitted to Blackboard by 14:00 on 12th May 2022. Your Blackboard submission should consist of the following individual files:
Please do not ZIP the files together as a single submission on Blackboard, you can submit multiple files to Blackboard.
For each criteria, please reflect on the marking rubric and indicate what grade you would expect to receive for the work that you are submitting. For your own personal development and learning, it is important to reflect on your work and to attempt to assess this careful. Do think carefully about both positive aspects of your work, as well as any limitations you may have faced.
Identification of the suspicious activity (25%): You estimate that your grade will be __.
Analytical reasoning and rationale (25%): You estimate that your grade will be __.
Python and Pandas proficiency (25%): You estimate that your grade will be __.
Clarity and professional report presentation (25%): You estimate that your grade will be __.
Please provide a minimum of two sentences to comment and reflect on your own self-assessment: __. __.
Questions about this assignment should be directed to your module leader (Phil.Legg@uwe.ac.uk). You should use the online Q&A form to ask questions related to this module and this assignment, as well as utilising the on-site teaching sessions.
# Import libraries as required
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 10)
In the cell below, you will need to change data_file
to your own specific data filename. The example data file is purely to demonstrate some initial steps for your investigation and should not be used.
data_file = 'example_data'
# Load in the data set as required -
data_path = './example_data/'
data = pd.read_csv(data_path + data_file, delim_whitespace=True)
#data.to_csv('out.csv')
temp_df = data[data.columns[:-1]]
temp_df.columns = data.columns[1:]
data = temp_df
data
# Get all column names
data.columns
# Search for all unique entries in 'cs(Referer)'
data['cs(Referer)'].unique()
# Get count of each unique value for 'cs(Referer)'
data['cs(Referer)'].value_counts()
# Plot the first 100 values for the 'time-taken' column
plt.figure(figsize=(20,5))
plt.plot(data['time-taken'][:100])
plt.show()
Carry on with the investigation based on the initial code provided above. Conclude you investigation with a summary of what activities you have identified and why they are deemed to be suspicious.