What factors contribute to the proteome diversity? Write the name and one sentence description of five main post translational events in the cellular system.

Bioinformatic Exam

1.(a) What factors contribute to the proteome diversity?

(b) Write the name and one sentence description of five main posttranslational events in the cellular system.

(d) Which analytical instrument is more effective in analyzing glycan structure?

2.Find the paper entitled, ‘Structural basis for diverse N-glycan recognition by HIV neutralizing V1–V2–directed antibody PG16Year published:

‘. (a) What critical PG16 residues are involved in glycan binding? (b) Why N-glycan is important in HIV pathogenesis? (c) Look at the glycan structure below and write the name of the type of this glycan (N-glycan or O-glycan) and Why?

(d) Find the putative structures of the following m/z (derived from mass spectrometry) values: 1171.78, 1315.70, 1345.90, 1375.92, 1416.95, 1519.95, 1550.05. [Use http://web.expasy.org/glycomod/] In the site chose the following options: Monoisotopic, Mass tolerance: 0.5 dalton, Ion mode and adducts: Na+; N-Linked oiligosaccharide; Under ‘Monosaccharide residues’ check permethylated and keep only Heoxe, HexNAc, DeoxyhexNAc and HexA as ‘possible’ (All other sugars will be ‘no):

Use the Helicobacter adhesin protein (Accession: BAF38019.1) For this problem, use one or more of the following websites to address the questions:

Prosite (http://prosite.expasy.org/)

MotifScan (http://myhits.isb-sib.ch/cgi-bin/motif_scan)

NetPhos (http://www.cbs.dtu.dk/services/NetPhos/)

PRED-TMBB (http://biophysics.biol.uoa.gr/PRED-TMBB/)

Find all potential tyrosine phosphorylation sites.

Which program was more informative

Which residues are predicted phosphorylated with > 50% probability?

Find all short (high probability of occurrence) patterns found in the sequence.

Look for a secondary structure motif (e.g. zinc finger, leucine zipper). Which one is found?

What is the consensus pattern for that motif?

Give the numerical locations of the amino acid residues that fit the pattern.

A 2005 paper on AlpA (adhesion protein) (Accession: EJB15421.1) stated that NNPREDICT predicted this to be a beta-barrel protein. Predict transmembrane beta-barrels.
Which program did you select and why?
Is the transmembrane domain likely near the C-terminus or N-terminus?
How many transmembrane regions are predicted?

Find what HMMs (local models) are matched by this sequence.

Which program did you select and why?

What is the strongest E-value (local models)?

From the pfam documentation, what alternating pattern suggests particular role (location) of this class of proteins?

Retrieve gi|4096360 from the NCBI protein database.

What species? How many amino acids? MPVPPPPPPPLPPPPPPLGAPPPPPPPGPPISTDAPSLRKSDLKGRSALLADIQQGTRLRKVTQINDRSAPQIESSKGTSKEGGAAGSNARGGSTPPALGDLFAGGFPVLRPAGQRDVAGGKTGQGPGSRAPSPRLPTKAISGPLPAPASPRLGNASDTHSSARPVPPRPSVPAPPPPTTPPPPPPPPPPPPPPPLPPASPIKAPSVSPPVPPTKGNPSAVPAPIPCVPPLPPPPPTPPPLPPASALSEKAVRPQLAPLHLPPIPPPLPLLPPYGYPALHSEPSSPAQDVREPPAPPPPPPPPPPPPPPPLPTYASCSPRAAVAPPPPPLPGSSNSGSETPPPLPPKSPSFQTQKALPTPPGAPGPQIILQKKRRGPGAGGGKLNPPPAPPARSPTTELSSKTQQPGGQLRNGGQHVIDDFESKFTFHSMEDFPPPDEYKPGQKIYPSKVPRSRTPGSWLQAEAAGQSSDDIKTRNSQLSLKALR

Looking at the sequence, what amino acid tends to be most prevalent?

Go to InterProScan (http://www.ebi.ac.uk/interpro/) and scan the protein sequence. Find the link that corresponds to the PRINTS database and open the PR link.

What is the title and accession number of this group of proteins?

These proteins account for how much of the dry weight of the cell wall?

Does the species in part A have cell walls?

Based on your answer, do you believe that the prediction is correct?

Go back to the InterProScan results. Click the link beginning with PF. What role does it play?

WASP proteins help control the polymerization of what cytoskeletal protein?

With what syndrome is the domain associated?

Click the Machesky and Insall 1998 literature reference in the Pfam summary. The Arp2/3 complex helps drive the formation of what cellular features?

Use sequence given below to answer. Use PROSITE, TMHMM or Phobius for the following problems:

MMKPYTLDTGYPVYGAKFITKRTLLTAGGGGEGNNGIPNKLSGFRIDFTKVKAVQKFRELTLSANEDCPM

SLDAANNVILLGVNENTSSIKQGKNNHLRKFGYINHHLKYLDKQQLSNSRDNRDYQKFTHLSSDASVACI

ATSKVPTTIYVVDPQTLEKKFDLETGVDVKDMHISPDGKIVCYVCANSLHGYSTVTAKLLFKDDSFKNHT

LMKVKFLDQHHLLIVGSQKQGISLIHYSLAKNSIVNSRVISKKLKGVTSLDTRNGVIALSGNDNSLLLIK

VSNLKPIKQFNKIHKFSITSCCFNKTGDLLATVSAANTVSVMEIPKGLATKKSLPRKIFNYFILTVFMAI

LAIVLQWSIENGHLQHAWQKLLNGDVIDSSRYFKVESIPDEELSASLLESSYSGLSSETKSVTDAISASV

PIIETTVDTTTTSNSVRRTIPEGYSTAKEFQPDKIKLYKSSLDSDPSADSTGSFTVSITETQSSSIKPQK

SKKTKKLKKKTSSTATPGSSTALVETLIVSTSPQSVDTSLSVTSPVASSTAQDSYLTSSAHSALTTKGAK

KLKKKRKIKTKTVKTTKTTETETETETIVQDDLADITISSDIVSDISSEVSNSETVVQEGLPTEVIESLL

NKDPTVDEPIAEESAIIDLPLGKPVTEKLTLDTQEAIESSVDEPAIGIDIEDAIVADEKEEPLTADDAVL

DGEEREIEDPDEIPSEVLTDVADSPEEYQDDININNSQEATSTHDSNVSEETAAPVEETQESVGTDDQAE

PEKFDEEEKTENEHQYEKDDHSELAEETQESHDDQVDEVEEVLEEVKDFLGETKDVLEEAKFVLEEAQEG

AEVSDKTELTEQDETEKFIESSDALRSSVGTVEATEQTTATSVDPNTEIDTKTTSEAVIESNTPVENEES

TVEDREINSSIDEIAEKSISQEEIVSKLEDSSSENNEGEREEEEIEESEDEKAGDIEEEEEEYEDDDDEY

ENVGTEEEKDEEEVQGDHEEQEDEEEVDDEEEEDDDEDEEDEDEDEDEEGENINHDEL

Find all long (not high-frequency match) patterns that match this sequence. List the accession number and the description for each.

Why is there no E value for this pattern?

Interpret the regular expression for this consensus pattern (> means “at the C-terminus”).

Which program(s) would you choose to find transmembrane domains?

Where is the transmembrane domain located?

What is the predicted location (inside or outside) of the C-terminal domain?

Attached is the file dpp4.fst. It contains mRNA sequences of dipeptidyl-peptidase 4 mRNA sequences from various organisms. Use MEGA to align the sequences with MUSCLE so that you can build phylogenetic trees.

Run a Maximum Parsimony and a Maximum Likelihood tree. Which best fits the dppimage file (attached)?

Submit the ML tree.

A specific gene’s phylogeny does not always exactly match species phylogeny. Give two reasons why that could happen.

The paper from which the image was selected suggests that the dpp4 bat sequences are positively selected genes. If that is the case, which would you expect to be higher—the non-synonymous substitution rate or the synonymous substitution rate? Briefly explain.

Some bat genes that are involved in cellular processes associated with flight are also positively selected genes. Why would some bat sequences have needed to undergo positive selection related to flight?

Go to the RCSB PDB database (http://www.pdb.org) and

Search for an author named M.J. Walczak.

How many hits? _____________________________

Find the most recent entry. What is its PDB number?

On what date was the structure released? ________________________

What method was used to determine the structure? ___________________

From what organism is polymer 1? ____________________________

In what organism was polymer 1 expressed (cloned)? ______________

From the Molecular Description graphic, how many alpha helices?

There is no SCOP/CATH classification data in this record. Why might that be?

Find 2C4A

1. What is its release date?
2. What is the EC number of the enzyme?
3. Resolution of the crystal structure?
4. List all five ligand chemical components (by identifier)
5. What is the SCOP class?
6. Click the EC number; where does that take you?

What factors contribute to the proteome diversity? Write the name and one sentence description of five main post translational events in the cellular system.