Bioinformatics programming in python(easy)

Problem 1

Let say I have a file name peaks.txt.

Chr1 7 9 4.5 5.5

chr10 6 9 3.5 4.5

chr1 10 6 2.5 4.4

Question is how can i sort the file so that it looks like this:

Chr1 7 9 4.5 5.5

chr1 10 6 2.5 4.4

chr10 6 9 3.5 4.5

Next is how do I extract out the p-values(i.e. 7,9,10,6,6,9)

After I extracted out all the p-values. for example all the p-values from chr1 is 6,7,9,10 and for chr10 are 6 and 9.

So for example if the p-value is 7 from chr1, i would open out a file called [url removed, login to view] which look like this:





and I will extract out the subsequence TACTA. Basically p-value(in this case its 7) position counting from second line of the [url removed, login to view] file and print out the subsequence from starting from position 7-d and 7+d, where d=2. Thus if the p-values is taken from chr10 then we read from the a file with file name [url removed, login to view] which can look like like:





So the question is how do I do this for all the p-values.(i.e all the p-values from chr1 and all the p-values from chr10) if let say we dont know [url removed, login to view] files have how many lines.

And how do i output it to a file such that it will have the following format:


peak value 6: TTGTA

peak value 7: TACTA

etc etc for all the p-values of chr1


peak value 7: TTACT

etc etc etc...

Problem 2, after generating the result, I wanna look for the number AGAACA or TGTTCT in the each sequences generated by each p-values. Plot an histogram with interval of together with the line graph. if d=2 then the x-axis is -2 to 2. So some conversion of values is need.

Take note I am working with bioinformatics so I hope you all can make use of numpy.

Note the above 2 problems are for Python.

The following problem are for microsoft access.

How do I port the output of a query straight away to an existing excel file. The data must be added to a specific starting cell.

Take note bidder need not bid for all the problems. However, Problem 1 and 2 are to be bid together. while problem 3 can be bid separately.


Deadline is next week since its just a simple project.

