Hello!
I am graduate student taking a Bioinformatics course that has been condensed into one month. My experience with python is very minimal, was introduced to it about 2 days ago. I was wondering if you could write a program that can manipulate text more specifically DNA sequence data. Yes, I have tried and been somewhat successful, but my code is not as accurate and is very wordy.
As you may already know, the return of DNA sequencing may result in low quality DNA. Good DNA is made up of As, Ts, Gs, Cs and a few Ns. Low quality DNA is when the sequence begins with a string of Ns and sometimes ends with a string of Ns. I need a program that will delete that strings of Ns including the good bases (A,T,G,C) within that string of Ns and to print out a good sequence. But Single and double Ns (N, NN) must remain within the “good sequence.” Ideally I would like the sequence to begin with any of the 4 good letters (A,T,G,C). Hopefully the bottom example will clear up any confusion. The sequence within the ## is what I want removed (that contains a bunch of Ns ) and uneffected portion is what I would like the program to return (print).
Bad DNA examples:
>10F_PREMIX Sample_Name=10F_PREMIX Chromat_id=6743560 Read_id=6682623Version=1 Length=1059
#GNNNNNNNNNTNNNTNNGN#TGATAATTGCAATCATCTATCCCTAACACGATGCAATTTGCAAGATTTCCCAAACCTTTCGGCCAAGGATCATACTCGTTGTTTGCATCATTGTAGCGCGCGTGCGGCCCAGAACATCTAAGGGCATCACAGACCTGTTATCGCCTCAAACCTCCATCCGCTTTTTATCACGGATCGTCCCGCTAAGTAGTTTTATATTTCTCCTTGCATGTTGCCATGCGAGGAACTAATCAGCAAGGTTAAGGTCTCGTTCGTTAACGGAATTAACCAGACAAATCACTCCACCAACTAAGAACGGCCATGCACCACCACTCATAGAATCAAGAAAGAGCTCTCAATCTGTCAATCCTTCCTATGTCCGGACCTGGTAAGTTTTCCCGTGTTGAGTCAAATTAAGCCGCAGGCTCCACGCCTGGTGGTGCCCTTCCGTCAATTCCTTTAAGTTTCAGCCTTGCGACCATACTCCCCCCAGAACCCAAAGACTTTGATTTCTCATAGAGTGCTGATAGAGTCGTCGTTGATACATCCACCAATCCTCAGTCGGCATAGTTTATGGTTAAGACTACGACGGTATCTAATCGTCTTCGATCCCCTAACTTTCGTTCTTGATCAATGAAAACATCCTTGGCAGATGCNCNC#NNNNNNNNNNNNNNNNNANCACNNTGNNNCTTGGATTGCNTGGANTGACTNTNGNNANGNGGNNNNAANCCNNNGNATAAATGGTGGTAGANTTGNCCNCAGNNNNGNATNGCNCGAGCGNGNTNGNTCCNGNNNGCCNNTGNAAGATATNTNTCNANCTANGTANCCNNNCNTCGGAGNTNGNGCTTNNNGNNNNNNNNNNNANANNANNNTGACCGNTNNNTNNNTNNANNNNANAAGANTATANTCGNACCNGNNCTANCNNNNGNNNNGNCCNNNANNGAGGNNNNNACNNNNAACNTTN#
>7R_PREMIX Sample_Name=7R_PREMIX Chromat_id=6743715 Read_id=6682773Version=1 Length=675
#GGNNNNNNNNNNNNNTNNNN#ACGAAGTTAGGGGANCGAAGACGATTAGATACCGTCGTAGTCTTAACCATAAACTATGCCGACTGAGGATTGGTGGATGTATCAACGACGACTCTATCAGCACTCTATGAGAAATCAAAGTCTTTGGGTTCTGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGCGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTTACCAGGTCCGGACATAGGAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCTATGAGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTTAACCTTGCTGATTAGTTCCTTGCATGGCAACATGCCAGGATAAATATAAAACTGCTTAGCGGGACGATCCGTGATAAAAAGCGGATGGAGGTTTGAGGCGATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACAATGATGCAAACAACGAGTATGATCCTTGACCGAAAGGTCTGGGAAACCTTGCAAATTGCATCGTGTTAGGGATAGATGATTGCAATTATTCATCTTGAACGAGGAATGCCTTGTAAGCGTGAGTCATCAACTCNNCGCTGAATN
bioinformatic with python
Relevant Skills and Experience
Python
Biomedical computational Research
Algorithm
Data Science
Machine Learning
Proposed Milestones
$40 USD - final