The amino acid sequence of a protein can be written as a string of letters, each representing a corresponding amino acid residue. For example, human insulin is a protein consisting of 51 amino acid residues, present as a dimer consisting of the A-chain and the B-chain with their amino acid sequences as follows:
A-Chain – GIVEQCCTSICSLYQLENYCN
B-Chain – FVNQHLCGSHLVEALYLVCGERGFFYTPKT
By using some of the basic functions in Python, we can have a simple programme that will identify the amino acid residue of interest, and return the position and occurrence within the string.
For example, first input the sequence for insulin B-Chain, following which we are interested to see the position and occurrence of phenylalanine (represented by the letter ‘F’). The result is as shown below, with ‘F’ occurring three times in this string at positions 1, 24 and 25.

The code for the programme as follows:

The lines with print(“”) just prints out an empty line and is included for the purpose of formatting the print out.
Line 18 first request the user to input the string of amino acid sequence, followed by Line 19 requesting the user to indicate the particular amino acid of interest. Line 21 calls the function defined as aa_counter, which performs the analysis of the position and occurrence of the interested amino acid.
Lines 1 to 15 constitutes the aa_counter function. Line 4 counts the length of the string input and Line 5 returns the number. Line 8 counts and returns the occurrence of the interested amino acid reside.
In order to return the position of the amino acid of interest, a for loop is utilised in Lines 13 to 15. Line 13 loops through the condition stated in Line 14 for every letter present in the string input earlier. If the letter at a particular position equals to the letter of interest, the index of that position will be returned. Here, we add 1 to the index position because Python indexing starts counting from zero instead of one.
Is there any way to make the above code more concise and elegant? Feel free to comment.