Folliculin protein

FLCN is predicted to encode the 579 amino acid protein FLCN (64kDa), consisting of a short hydrophobic N-terminal sequence, one N-glycosylation site, three myristoylation sites and a glutamic acid-rich coiled coil domain centrally located in the protein (Nickerson et al., 2002). Moreover, FLCN is observed to be phosphorylated (Baba et al., 2006; Dephoure et al., 2008; Gauci et al., 2009; Piao et al., 2009; Wang et al., 2010) and ubiquitinated (Danielsen et al., 2011; Wagner et al., 2011). Further details regarding these modifications can be found here.

One isoform, which is 342 amino acids in length (Uniprot ID: Q8NFG-2) and consists of the N-terminal 290 amino acids of FLCN plus an additional 52 amino acids not found in the FLCN protein, has been observed in four cDNA libraries (Refseq ID: NM_144606.5; Ensembl ID: ENST00000389169). How widely expressed this isoform is, or how it functions, is currently unknown. An additional isoform of 197 amino acids has been predicted (Uniprot ID: Q8NFG4-3) which consists of the N-terminal 133 amino acids of FLCN plus an additional 64 amino acids which are not found in the FLCN protein. However, there is no experimental evidence to confirm the existence of this isoform in vivo.

The protein sequence of the canonical FLCN isoform is as follows:

MNAIVALCHFCELHGPRTLFCTEVLHAPLPQGDGNEDSPGQGEQAEEEEGGIQMNSRMRAHSPAE

GASVESSSPGPKKSDMCEGCRSLAAGHPGYISHDKETSIKYVSHQHPSHPQLFSIVRQACVRSLSCEV

CPGREGPIFFGDEQHGFVFSHTFFIKDSLARGFQRWYSIITIMMDRIYLINSWPFLLGKVRGIIDELQGK

ALKVFEAEQFGCPQRAQRMNTAFTPFLHQRNGNAARSLTSLTSDDNLWACLHTSFAWLLKACGSR

LTEKLLEGAPTEDTLVQMEKLADLEEESESWDNSEAEEEEKAPVLPESTEGRELTQGPAESSSLSG

CGSWQPRKLPVFKSLRHMRQVLGAPSFRMLAWHVLMGNQVIWKSRDVDLVQSAFEVLRTMLPVG

CVRIIPYSSQYEEAYRCNFLGLSPHVQIPPHVLSSEFAVIVEVHAAARSTLHPVGCEDDQSLSKYEFVV

TSGSPVAADRVGPTILNKIEAALTNQNLSVDVVDQCLVCLKEEWMNKVKVLFKFTKVDSRPKEDTQ

KLLSILGASEEDNVKLLKFWMTGLSKTYKSHLMSTVRSPTASESRN

List of amino acids, their abbreviations and details.

Nahorski et al. (2011) performed in silico evolutionary analysis on the FLCN gene and found that FLCN was under strong purifying selection, meaning that the sequence evolved more slowly at the protein level than the average gene, in particular between codons 100-230, suggesting an important function for this N-terminal region.

No transmembrane domains or organelle localisation signals have been determined within the FLCN sequence (Warren et al., 2004). The sequence has no significant homology to any known protein, but it is highly conserved across species including Bos Taurus, Canis lupus familiaris, Rattus norvegicus, Mus musculus, Gallus gallus, Xenopus tropicalis, Drosophila melanogaster, and Saccharomyces cerevisiae, suggesting an important biological role (Nickerson et al., 2002). A sequence alignment shows this conservation:

 

Alignment made using MUSCLE (Edgar, 2004), manually edited using Jalview (Waterhouse et al., 2009) and displayed with Clustal colours. Alignments performed by Angela Pacitto.