Devam Ediyor

522512 unix shell scripting

Background

In this session,You will write a script that processes a log file and extracts useful information summaries from it, a task that is very common in security and system administration work. The log file in question is a web server log file from the small, portable and secure webserver thttpd. It follows a standard format that is very common in the UNIX world, that is, it's a flat text file with one record per line and fields delimited by whitespace or other [url removed, login to view] task is to write a script that answers the following questions, the idea is that this script will be used by sys admins to extract and summarise relevant information about the operation of the webserver so that they can check for anomalies, i.e. things that stand out from the ordinary, and investigate further if necessary. The logfile consists of lines where each line represent one access request. It pretty much follows the unified apache combined log format. The fields, describe the following: IP number the request originated from; the ident answer from the originator (always '-'); the username of the requester as determined by http authentication; date and time the request was processed; the request line as it was received from the client in double quotes; http status code that was sent to the client; number of bytes that was transferred to the client; referer page (from the client); user agent string (from the client). See e.g. the Apache documentation for more details [url removed, login to view]). Note that the fields are not well delimited (i.e. there is no reserved character to separate fields), this is unfortunately common when it comes to log files and a problem you have to contend with. The task You will write a shell script (and a Python program if you so chose) that reads a log file and answers the following questions:

• Which IP addresses makes the most number of connection attempts?

• Which IP addresses makes the most number of successful connection attempts?

• What are the most common result codes and where do they come from (IP number)?

• What are the most common result codes that indicate failure (no auth, not found etc) and where

do they come from?

• Which IP number get the most bytes sent to them?

For all the above the answer should be in the form of a sorted list with the largest number first (i.e. topmost).The user should be able to cap all answers, i.e. by asking for only the 'x' topmost [url removed, login to view] no cap all available answers should be presented. Also all questions should be able to be qualified by time range, i.e. in last number of hours or days. You are allowed to assume that hours will be less than 24 and that the day changes at midnight (so “the last 24 hours” and “the last day” doesn't necessarily refer to the same period of time, the hours count from now, and the day counts back until the last midnight). All times are relative to the last time in the logfile, not the time when the log analysing program is run. Queries without a range specified are assumed to refer to time span represented in the whole log [url removed, login to view] arguments to the shell script should be coded as such: log_sum(.sh|.py) [-n N] [-h H|-d D] [-c|-2|-r|-F|-t|-f] <filename> log_sum is the name of the script; .sh means a bourne shell script, .py means a python script -n: Limit the number of results to N -h: Limit the query to the last number of hours (< 24) -d: Limit the query to the last number of days (counting from midnight)

-c: Which IP address makes the most number of connection attempts?

-2: Which address makes the most number of successful attempts?

-r: What are the most common results codes and where do they come from?

-F: What are the most common result codes that indicate failure (no auth, not found etc) and where do they come from?

-t: Which IP number get the most bytes sent to them?

<filename> refers to the logfile. If '-' is given as a filename, or

no filename is given, then standard input should be read. This enables the script to be used in a pipeline. In the above '|' denotes choice, one of. Furthermore, '[]' denotes that all within the square brackets is optional. So e.g. “(.sh|.py)” means that the two allowed forms of program name are [url removed, login to view] and log_sum.py. “[-h H|-d D]” means that either '-h' or '-d' or neither (choice within optional brackets) can be given, but not [url removed, login to view] output format should be in the form:

-c: [url removed, login to view] yyy where yyy is the number of connection attempts

-2: [url removed, login to view] yyy where yyy is the number of successful attempts

-r: yyy [url removed, login to view] where yyy is the result code, one ip per line

-F: yyy [url removed, login to view] where yyy is the result code indicating failure, one ip per line-t: [url removed, login to view] yyy where yyy is the number of bytes sent from the server

-f: [url removed, login to view] yyy where yyy is the number of bytes sent to the server (REMOVED)

[url removed, login to view] is the IP address in dotted quad form. Yyy is an integer (with between 1 and the required number of digits, i.e. not just three digits). You are allowed to insert tabs and or whitespace between the elements of the output according to taste. (But not within the elements, i.e. within an IP address). For -F and -r you are allowed to output multiple lines with the same result code or count, but the groups must still be sorted. There should still only be one ip adress per line.

Example usage:

$[url removed, login to view] -n 10 -c [url removed, login to view]

[url removed, login to view] 2438

[url removed, login to view] 1202

[url removed, login to view] 731

[url removed, login to view] 591

[url removed, login to view] 480

[url removed, login to view] 429

[url removed, login to view] 400

[url removed, login to view] 336

[url removed, login to view] 336

[url removed, login to view] 272

That is to say, we're asking for a sorted list of the top ten most connexion originating IP addresses, most prolific first.. The above example usage can be used as a test case as it is correct given the [url removed, login to view] file you have been supplied with.

Presentation and limitations

The assignment should be presented in person with both group members in attendance on a lab slot. You will be required to show the code, let me run it with the test file you have been given and be prepared to discuss your [url removed, login to view]'re allowed to use all tools available to the shell script programmer, except you're not allowed to use awk for any major processing, i.e. using awk for printing, field selection etc. is OK, writing the “shell script” part of the lab completely in awk and just call that from a Bourne shell script is not OK. (Use Python instead). You're allowed to use the full capabilities of the Python language, together with the official standard libraries of the C-python distribution.

The [url removed, login to view] file is attached with the email

Detailed documentation regarding the coding must be given and explained

Beceriler: Her şey Kabul, Apache, Makale Yeniden Yazım, PHP, Python, Kabuk Betiği, Sistem Yöneticisi, UNIX, Web Güvenliği

Daha fazlasını görün: writing to a file in python, writing task 1 with answers, writing secure code, writing multiple choice questions, writing in standard form, where to get python code, where do you get a python, where can i get python, what is an assignment problem, what do you have to do to get an writing agent, what are admins, web programmer tools, two string problem, top query, top 10 programmer in the world, ten questions, task assignment problem, summaries writing, string processing in c, string problem, session 13, server scripting language, selection problem, script writing program, run shell script from php

İşveren Hakkında:
( 0 değerlendirme )

Proje NO: #2268450