Academic Positions

  • Present 2015

    Research Fellow

    University of Michigan

  • 2014 2014

    Postdoc

    Fiocruz, Carlos Chagas Institute

Education & Training

  • Ph.D. 2014

    Ph.D. in Bioinformatics

    Fiocruz, Carlos Chagas Institute

  • MSc.2010

    Master in Molecular and Cellular Biology

    Fiocruz, Oswaldo Cruz Institute

  • Tech.2014

    Systems Analysis and Development

    Positivo University

  • B.A.2006

    Biological Sciences

    Positivo University

Research Summary

Recent advances in the molecular biology field, more specifically related to the omics sciences, are creating a necessity for better analisys methodologies. Much of my work centers on the development of bioinformatics software for proteomics and exploratory data analysis. As a system analyst & developer I have been promoting the good practices in developing computational tools for the life sciences while at the same time, aplying my knowledge to create and share open-source libraries and software that other can use on their projects. As a molecular biologist, my interest centers on how to properly analyse and interpert biological data, more specifically proteomics. My focus lies mainly on developing better exploratory data analysis and techniques for protein functional analysis, aiming for better characterizations of biological phenomena.

More recently, I started working with protein post-translational modification (PTM) annotation systems, seeking to develop a robust annotation pipeline for big data sets. By applying different methodologies like open-searches, in landscape scenarios, we hope to shed more light on misinterpreted peptide mass spectra.

Interests

  • Molecular & Cell Biology
  • Proteomics
  • Protein Functional Annotation
  • Exploratory Data Analysis
  • Software Development
  • Molecular Biology

Laboratory Personel

Dmitriy Avtonomov

Postdoctoral fellow

+ Follow

Dattatreya Mellacheruvu

Postdoctoral fellow

+ Follow

Alexey Nesvizhskii

Associate Professor

+ Follow

Guo Ci Teo

Postdoctoral fellow

+ Follow

Andy Kong

PhD Student

+ Follow

Venkatesha Basrur

Assistant Research Scientist

+ Follow

Kevin Conlon

Research Lab Specialist Senior

+ Follow

Great lab Personel!

These are the people who work with me at the Proteome Bioinformatics laboratory.

Filter by type:

Sort by year:

Discovering and linking public omics data sets using the Omics Discovery Index

Perez-Riverol Y, Bai M, Leprevost FV, Squizzato S, Park YM, Haug K, Carroll AJ, Spalding D, Paschall J, Wang M, del-Toro N, Ternent T, Zhang P, Buso N, Bandeira N, Deutsch EW, Campbell DS, Beavis RC, Salek RM, Sarkans U, Petryszak R, Keays M, Fahy E, Sud M, Subramaniam S, Barbera A, Jiménez RC, Nesvizhskii AI, Sansone S, Steinbeck C, Lopez R, Vizcaíno JA, Ping P, Hermjakob H
Journal PaperNature Biotechnology. 2017

Abstract

Biomedical data are being produced at an unprecedented rate owing to the falling cost of experiments and wider access to genomics, transcriptomics, proteomics and metabolomics platforms1, 2. As a result, public deposition of omics data is on the increase. This presents new challenges, including finding ways to store, organize and access different types of biomedical data stored on different platforms. Here, we present the Omics Discovery Index (OmicsDI; http://www.omicsdi.org), an open-source platform that enables access, discovery and dissemination of omics data sets.

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics

Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI
Journal PaperNature Methods. 2017

Abstract

There is a need to better understand and handle the 'dark matter' of proteomics—the vast diversity of post-translational and chemical modifications that are unaccounted in a typical mass spectrometry–based analysis and thus remain unidentified. We present a fragment-ion indexing method, and its implementation in peptide identification tool MSFragger, that enables a more than 100-fold improvement in speed over most existing proteome database search tools. Using several large proteomic data sets, we demonstrate how MSFragger empowers the open database search concept for comprehensive identification of peptides and all their modified forms, uncovering dramatic differences in modification rates across experimental samples and conditions. We further illustrate its utility using protein–RNA cross-linked peptide data and using affinity purification experiments where we observe, on average, a 300% increase in the number of identified spectra for enriched proteins. We also discuss the benefits of open searching for improved false discovery rate estimation in proteomics.

BioContainers: An open-source and community-driven framework for software standardization

Leprevost FV, Grüning BA, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y
Journal PaperBioinformatics. 2017

Abstract

BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters).

Quantitative proteomic analysis of the Saccharomyces cerevisiae industrial strains CAT-1 and PE-2

Santos RM, Nogueira FCS, Brasila AA, Carvalho PC, Leprevost FV, Domont GB, Eleutherio ECA
Journal PaperJournal of Proteomics. 2016

Abstract

Brazilian ethanol fermentation process commonly uses baker's yeast as inoculum. In recent years, wild type yeast strains have been widely adopted. The two more successful examples are PE-2 and CAT-1, currently employed in Brazilian distilleries. In the present study, we analyzed how these strains compete for nutrients in the same environment and compared the potential characteristics which affect their performance by applying quantitative proteomics methods. Through the use of isobaric tagging, it was possible to compare protein abundances between both strains during the fermentation process. Our results revealed a better fermentation performance and robustness of CAT-1 strain. The proteomic results demonstrated many possible features that may be linked to the improved fermentation traits of the CAT-1. Proteins involved in response to oxidative stress (Sod1 and Trx1) and trehalose synthesis (Tps3) were more abundant in CAT-1 than in PE-2 after a fermentation batch. Tolerance to oxidative stress and trehalose accumulation were subsequently demonstrated to be enhanced for CAT-1, corroborating the comparative proteomic results. The importance of trehalose and the antioxidant system was confirmed by using mutant stains deleted in Sod1, Trx1 or Tps3. These deletions impaired fermentation performance, strengthening the idea that the abilities of accumulating high levels of trehalose and coping with oxidative stress are crucial for improving fermentation.

Ten Simple Rules for Taking Advantage of Git and GitHub

Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost FV, Fufezan C, Ternent T, Eglen SJ, Katz DS, Pollard TJ, Konovalov A, Flight RM, Blin K, Vizcaíno JA
Journal PaperPLOS Computational Biology. 2016

Abstract

Bioinformatics is a broad discipline in which one common denominator is the need to produce and/or use software that can be applied to biological data in different contexts. To enable and ensure the replicability and traceability of scientific claims, it is essential that the scientific publication, the corresponding datasets, and the data analysis are made publicly available.

Venomous extract protein profile of Brazilian tarantula Grammostola inhering: searching for potential biotechnological applications

Borges MH, Figueiredo SG, Leprevost FV, Lima ME, Cordeiro MN, Diniz MRV, Moresco J, Carvalho PC, Yates III, JR
Journal PaperJournal of Proteomics.2016

Abstract

Tarantula spiders, Theraphosidae family, are spread throughout most tropical regions of the world. Despite their size and reputation, there are few reports of accidents. However, like other spiders, their venom is considered a remarkable source of toxins, which have been selected through millions of years of evolution. The present work provides a proteomic overview of the fascinating complexity of the venomous extract of the Grammostolaiheringi tarantula, obtained by electrical stimulation of the chelicerae. For analysis a bottom-up proteomic approach Multidimensional Protein Identification Technology (MudPIT) was used. Based on bioinformatics analyses, PepExplorer, a similarity-driven search tool that identifies proteins based on phylogenetically close organisms a total of 395 proteins were identified in this venomous extract. Most of the identifications (~ 70%) were classified as predicted (21%), hypothetical (6%) and putative (37%), while a small group (6%) had no predicted function. Identified molecules matched with neurotoxins that act on ions channels; proteases, such as serine proteases, metalloproteinases, cysteine proteinases, aspartic proteinases, carboxypeptidases and cysteine-rich secretory enzymes (CRISP) and some molecules with unknown target. Additionally, non-classical venom proteins were also identified. Up to now, this study represents, to date, the first broad characterization of the composition of G. iheringi venomous extract. Our data provides a tantalizing insight into the diversity of proteins in this venom and their biotechnological potential.

Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0

Carvalho PC, Lima DB, Leprevost FV, Santos MDM, Fischer JSG, Aquino PF, Moresco JJ, Yates III JR, Barbosa VC
Journal PaperNature Protocols.2016

Abstract

PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database. The PatternLab for proteomics 4.0 package brings together all of these modules in a self-contained software environment, which allows for complete proteomic data analysis and the display of results in a variety of graphical formats. All updates to PatternLab, including new features, have been previously tested on millions of mass spectra. PatternLab is easy to install, and it is freely available from http://patternlabforproteomics.org.

Using PepExplorer to Filter and Organize De Novo Peptide Sequencing Results

Leprevost FV, Barbosa VC, Carvalho PC
Book ChapterCurrent Protocols in Bioinformatics. 2015.

Abstract

PepExplorer aids in the biological interpretation of de novo sequencing results; this is accomplished by assembling a list of homolog proteins obtained by aligning results from widely adopted de novo sequencing tools against a target-decoy sequence database. Our tool relies on pattern recognition to ensure that the results satisfy a user-given false-discovery rate (FDR). For this, it employs a radial basis function neural network that considers the precursor charge states, de novo sequencing scores, the peptide lengths, and alignment scores. PepExplorer is recommended for studies addressing organisms with no genomic sequence available. PepExplorer is integrated into the PatternLab for proteomics environment, which makes available various tools for downstream data analysis, including the resources for quantitative and differential proteomics.

Reevaluating the Trypanosoma cruzi proteomic map: the shotgun description of bloodstream trypomastigotes

Brunoro GVF, Caminha MA, Ferreira AT da Silva F, Leprevost FV, Carvalho PC, Perales J, Valente RH, Menna-Barreto FRS
Journal PaperJournal of Proteomics. 2015.

Abstract

Chagas disease is a neglected disease, caused by the protozoan Trypanosoma cruzi. This kinetoplastid presents a cycle involving different forms and hosts, being trypomastigotes the main infective form. Despite various T. cruzi proteomic studies, the assessment of bloodstream trypomastigotes profile remains unexplored. The aim of this work is T. cruzi bloodstream form proteomic description. Employing shotgun approach, 17,394 peptides were identified, corresponding to 7,514 proteins of which 5,901 belong to T. cruzi. Cytoskeletal proteins, chaperones, bioenergetics-related enzymes, trans-sialidases are among the top-scoring. GO analysis revealed that all T. cruzi compartments were assessed; and majority of proteins are involved in metabolic processes and/or presented catalytic activity. The comparative analysis between the bloodstream trypomastigotes and cultured-derived or metacyclic trypomastigotes proteomic profiles pointed to 2,202 proteins exclusively detected in the bloodstream form. These exclusive proteins are related to: (a) surface proteins; (b) non-classical secretion pathway; (c) cytoskeletal dynamics; (d) cell cycle and transcription; (e) proteolysis; (f) redox metabolism; (g) biosynthetic pathways; (h) bioenergetics; (i) protein folding; (j) cell signaling; (k) vesicular traffic; (l) DNA repair; (m) cell death. This large-scale evaluation of bloodstream trypomastigotes, responsible for the parasite dissemination in the patient, marks a step forward in the comprehension of Chagas disease pathogenesis.

On best practices in the development of bioinformatics software

Leprevost FV, Barbosa VC, Francisco EL, Perez-Riverol Y, Carvalho PC
Journal PaperFrontiers in Genetics. 2014.

Abstract

Bioinformatics is one of the major areas of study in modern biology. Medium- and large-scale quantitative biology studies have created a demand for professionals with proficiency in multiple disciplines, including computer science and statistical inference besides biology. Bioinformatics has now become a cornerstone in biology, and yet the formal training of new professionals (Perez-Riverol et al., 2013; Via et al., 2013), the availability of good services for data deposition, and the development of new standards and software coding rules (Sandve et al., 2013; Seemann, 2013) are still major concerns. Good programming practices range from documentation and code readability through design patterns and testing (Via et al., 2013; Wilson et al., 2014). Here, we highlight some points for best practices and raise important issues to be discussed by the community.

PepExplorer: a similarity-driven tool for analyzing de novo sequencing results

Leprevost FV, Valente RH, Borges DL, Perales J, Melani R, Yates III JR, Barbosa VC, Junqueira M, Carvalho PC
Journal PaperMollecular & Cellular Proteomics. 2014.

Abstract

Peptide Spectrum Matching (PSM) is the current gold standard for protein identification by mass spectrometry-based proteomics. PSM compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database can not be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can infer a peptide sequence directly from a mass spectrum, but interpreting long lists of very similar peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate (FDR). To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We have verified the effectiveness of our approach on a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we have been able to recover most of the identifications at a 1% FDR. Finally, we have employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.

Bio::DB::NextProt: A Perl Module for neXtProt Database Information Retrieval

Leprevost FV.
Journal PaperPeerJ. 2014.

Abstract

The neXtProt database is a comprehensive knowledge platform recently adopted by the Chromosome-centric Human Proteome Project as the main reference database. The primary goal of the project is to identify and catalog every human protein encoded in the human genome. For such, computational approaches have an important role as data analysis and dedicated software are indispensable. Here we describe Bio::DB::NextProt, a Perl module that provides an object-oriented access to the neXtProt REST Web services, enabling the programatically retrieval of structured information. The Bio::DB::NextProt module presents a new way to interact and download information from the neXtProt database. Every parameter available through REST API is covered by the module allowing a fast, dynamic and ready-to-use alternative for those who need to access neXtProt data. Bio::DB::NextProt is an easy-to-use module that provides automatically retrieval of data, ready to be integrated into third-party software or to be used by other programmers on the fly. The module is freely available from from CPAN (metacpan.org/release/Bio-DB-NextProt) and GitHub (github.com/Leprevost/Bio-DB-NextProt) and is released under the perl\_5 license.

Proteome Analysis of Formalin-Fixed Paraffin-Embedded Tissues from a Primary Gastric Melanoma and its Meningeal Metastasis: A Case Report

Fischer JSG, Canedo NHS, Gonçalves KMS, Chimelli MC, França M, Leprevost FV, Aquino PF, Carvalho PC, Carvalho MGC
Journal PaperCurrent Topics in Medicinal Chemistry. 2014.

Abstract

Melanoma is the third most common brain metastasis cause in the United States as it has a relatively high susceptibility to metastasize to the central nervous system. Among the different origins for brain metastasis, those originating from primary gastric melanomas are extremely rare. Here, we compare protein profiles obtained from formalin-fixed paraffin- embedded (FFPE) tissues of a primary gastric melanoma with its meningeal metastasis. For this, the contents of a microscope slide were scraped and ultimately analyzed by nano-chromatography coupled online with tandem mass spectrometry using an Orbitrap XL. Our results disclose 184 proteins uniquely identified in the primary gastric melanoma, 304 in the meningeal metastasis, and 177 in common. Notably, we indentified several enzymes related to changes in the metabolism that are linked to producing energy by elevated rates of glycolysis in a process called the Warburg effect. Moreover, we show that our FFPE proteomic approach allowed identification of key biological markers such as the S100 protein that we further validated by immunohistochemistry for both, the primary and metastatic tumor samples. That said, we demonstrated a powerful strategy to retrospectively mine data for aiding in the understanding of metastasis, biomarker discovery, and ultimately, diseases. To our knowledge, these results disclose for the first time a comparison of the proteomic profiles of gastric melanoma and its corresponding meningeal metastasis.

Pinpointing differentially expressed domains in complex protein mixtures with the cloud service of PatternLab for Proteomics

Leprevost FV, Borges D, Crestani J, Perez-Riverol Y, Zanchin N, Barbosa VC, Carvalho PC
Journal PaperJournal of Proteomics. 2013.

Abstract

Mass-spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. Here we describe a new module integrated into PatternLab for Proteomics that allows the pinpointing of differentially expressed domains. This is accomplished by inferring functional domains through our cloud service, using HMMER3 and Pfam remotely, and then mapping the quantitation values into domains for downstream analysis. In all, spotting which functional domains are changing when comparing biological states serves as a complementary approach to facilitate the understanding of a system's biology. We exemplify the new module's use by reanalyzing a previously published MudPIT dataset of Cryptococcus gattii cultivated under iron-depleted and replete conditions. We show how the differential analysis of functional domains can facilitate the interpretation of proteomic data by providing further valuable insight.

Effectively addressing complex proteomic search spaces with peptide spectrum matching

Lima DB, Perez-Riverol Y, Nogueira FCS, Domont GB, Noda J, Leprevost FV, Besada V, Franca FMG, Barbosa VC, Sanchez A, Carvalho PC
Journal PaperBioinformatics. 2013

Abstract

Protein identification by mass spectrometry is commonly accomplished using a peptide sequence matching search algorithm, whose sensitivity varies inversely with the size of the sequence database and the number of post-translational modifications considered. We present the Spectrum Identification Machine, a peptide sequence matching tool that capitalizes on the high-intensity b1-fragment ion of tandem mass spectra of peptides coupled in solution with phenylisotiocyanate to confidently sequence the first amino acid and ultimately reduce the search space. We demonstrate that in complex search spaces, a gain of some 120% in sensitivity can be achieved.

Computational Proteomics Pitfalls and Challenges: HavanaBioinfo 2012 Workshop Report

Perez-Riverol Y, Hermjabok H, Kohlbacher O, Martens L, Creasy D, Cox J, Leprevost, FV, Shan BP, Cabrera G, Padron G, Gonzales LJ, Besada V
Journal Paper Journal of Proteomics. 2013.

Abstract

The workshop “Bioinformatics for Biotechnology Applications (HavanaBioinfo 2012)”, held December 8–11, 2012 in Havana, aimed at exploring new bioinformatics tools and approaches for large-scale proteomics, genomics and chemoinformatics. Major conclusions of the workshop include the following: (i) development of new applications and bioinformatics tools for proteomic repository analysis is crucial; current proteomic repositories contain enough data (spectra/identifications) that can be used to increase the annotations in protein databases and to generate new tools for protein identification; (ii) spectral libraries, de novo sequencing and database search tools should be combined to increase the number of protein identifications; (iii) protein probabilities and FDR are not yet sufficiently mature; (iv) computational proteomics software needs to become more intuitive; and at the same time appropriate education and training should be provided to help in the efficient exchange of knowledge between mass spectrometrists and experimental biologists and bioinformaticians in order to increase their bioinformatics background, especially statistics knowledge.

  • image

    BioContainers

    Projecting and distributing bioinformatics containers

    BioContainers is an open source and community-driven framework which provides system-agnostic executable environments for bioinformatics software. BioContainers framework allows software to be installed and executed under an isolated and controllable environment.
  • image

    Software Development in Bioinformatics

    Best practices in software evelopment

    Bioinformatics is now one of the major research areas in biological sciences, and yet the formal training of new professionals, the availability of good services for data deposition, and the development of new standards and software coding rules are still major concerns. This project aims to propagate and stimulate the use of good practices of software development in bioinformatics.

Contact & Meet Me

I would be happy to talk to you if you need my assistance in your research or whether you need bussiness administration support for your company.

  •    phone: +1 (734) 436-1805
  •    lab: +1 (734) 764-3516
  •    felipe@leprevost.com.br
  •    leprevostfv
  •    #leprevostfv

At My Lab

You can find me at my Work located at the University of Michigan, Ann Arbor.

I am at my office two days a week from 9:00 until 17:00 pm, but you may consider a call to fix an appointment.