Professional Positions

  • Present 2014

    Postdoctoral Researcher

    Fiocruz, Carlos Chagas Institute.

  • Present 2013

    Founder & CEO

    Hexabio Bioinformatics.

Education & Training

  • Ph.D. 2014

    Ph.D. in Bioinformatics & Computational Biology

    Fiocruz, Carlos Chagas Institute.

  • MSc.2010

    Master in Molecular and Cellular Biology

    Fiocruz, Oswaldo Cruz Institute.

  • Tech.2014

    Systems Analysis and Development

    Positivo University

  • B.A.2006

    Biological Sciences

    Positivo University

Honors, Awards and Grants

  • Fiocruz ICC 2013
    Honorable Mention
    image
    Carlos Chagas Institute academic week.
  • Fiocruz IOC 2006
    First of Class
    image
    First place in MSc. selection process, class of 2009.

Societies

  • ISCB
    International Society for Computational Biology
    image
    The International Society for Computational Biology (ISCB) serves over 3,000 members from more than 70 countries by addressing scientific policies, providing access to high quality publications, organizing meetings, and serving as a portal to information about training, education, employment and news from related fields.
  • BrProt
    Brazilian Proteomic Society
    image
    Brazilian society dedicated to organize and discuss proteomics.
  • Curitiba PM
    Curitiba Perl Mongers
    image
    Local group dedicated to the Perl programming language, founded by me in 2012.

Journal Participation

  • Editor
    Frontiers in Genetics: Bioinformatics & Computational Biology section.
  • Reviewer
    Current Topics in Medicinal Chemistry.

Laboratory Personel

Nilson Ivo Tonin Zanchin

Laboratory Head

+ Follow

Paulo Costa Carvalho

Research Assistant

+ Follow

Tatiana de Arruda Campos Brasil de Souza

Research Assistant

+ Follow

Juliana de Saldanha da Gama Fischer

Postdoctoral fellow

+ Follow

Diogo Borges

Doctorate Student

+ Follow

Great lab Personel!

These are the people who work with me at the Laboratory for Proteomics and Protein Engineering, Fiocruz.

Research Projects

  • image

    Omics Data Integration

    Cross-layer data integration for omics data.

    I am currently working on different projects involving the integration of large scale omics data. The integration of different types of data can provide a better interpretation of biological systems.

  • image

    Protein de novo Sequencing

    Similarity driven analysis of protein de novo sequencing.

    De novo sequencing of proteins still poses major challenges principally in data interpretation. This project focuses on the development of a sequence similarity-based tool guided by of artificial intelligence techniques to improve de novo data analysis.

  • image

    Software Development in Bioinformatics

    Techniques and best practices in software develeopment for bioinformatics.

    Bioinformatics is now one of the major research areas in biological sciences, and yet the formal training of new professionals, the availability of good services for data deposition, and the development of new standards and software coding rules are still major concerns. This project aims to propagate and stimulate the use of good practices of software development in bioinformatics.

  • image

    Bio Docker

    Docker for Bioinformatics

    The main purpose of this project is to spread the use of Docker on the Bioinformatics and Computational Biology areas. By using pre-configured containers with different bioinformatic softwares some critical aspects of Bioinformatics like reproducibility are minimized. Here you will find a list of containers with different bioinformatics software and how to use it.

Filter by type:

Sort by year:

On best practices in the development of bioinformatics software.

Leprevost FV, Barbosa VC, Francisco EL, Perez-Riverol Y, Carvalho PC.
Journal PaperFrontiers in Genetics. 2014.

Abstract

Bioinformatics is one of the major areas of study in modern biology. Medium- and large-scale quantitative biology studies have created a demand for professionals with proficiency in multiple disciplines, including computer science and statistical inference besides biology. Bioinformatics has now become a cornerstone in biology, and yet the formal training of new professionals (Perez-Riverol et al., 2013; Via et al., 2013), the availability of good services for data deposition, and the development of new standards and software coding rules (Sandve et al., 2013; Seemann, 2013) are still major concerns. Good programming practices range from documentation and code readability through design patterns and testing (Via et al., 2013; Wilson et al., 2014). Here, we highlight some points for best practices and raise important issues to be discussed by the community.

PepExplorer: a similarity-driven tool for analyzing de novo sequencing results.

Leprevost FV, Valente RH, Borges DL, Perales J, Melani R, Yates III JR, Barbosa VC, Junqueira M, Carvalho PC.
Journal PaperMollecular & Cellular Proteomics. 2014.

Abstract

Peptide Spectrum Matching (PSM) is the current gold standard for protein identification by mass spectrometry-based proteomics. PSM compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database can not be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can infer a peptide sequence directly from a mass spectrum, but interpreting long lists of very similar peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate (FDR). To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We have verified the effectiveness of our approach on a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we have been able to recover most of the identifications at a 1% FDR. Finally, we have employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.

Bio::DB::NextProt: A Perl Module for neXtProt Database Information Retrieval.

Leprevost FV.
Journal PaperPeerJ. 2014.

Abstract

The neXtProt database is a comprehensive knowledge platform recently adopted by the Chromosome-centric Human Proteome Project as the main reference database. The primary goal of the project is to identify and catalog every human protein encoded in the human genome. For such, computational approaches have an important role as data analysis and dedicated software are indispensable. Here we describe Bio::DB::NextProt, a Perl module that provides an object-oriented access to the neXtProt REST Web services, enabling the programatically retrieval of structured information. The Bio::DB::NextProt module presents a new way to interact and download information from the neXtProt database. Every parameter available through REST API is covered by the module allowing a fast, dynamic and ready-to-use alternative for those who need to access neXtProt data. Bio::DB::NextProt is an easy-to-use module that provides automatically retrieval of data, ready to be integrated into third-party software or to be used by other programmers on the fly. The module is freely available from from CPAN (metacpan.org/release/Bio-DB-NextProt) and GitHub (github.com/Leprevost/Bio-DB-NextProt) and is released under the perl\_5 license.

Proteome Analysis of Formalin-Fixed Paraffin-Embedded Tissues from a Primary Gastric Melanoma and its Meningeal Metastasis: A Case Report.

Fischer JSG, Canedo NHS, Gonçalves KMS, Chimelli MC, França M, Leprevost FV, Aquino PF, Carvalho PC, Carvalho MGC.
Journal PaperCurrent Topics in Medicinal Chemistry. 2014.

Abstract

Melanoma is the third most common brain metastasis cause in the United States as it has a relatively high susceptibility to metastasize to the central nervous system. Among the different origins for brain metastasis, those originating from primary gastric melanomas are extremely rare. Here, we compare protein profiles obtained from formalin-fixed paraffin- embedded (FFPE) tissues of a primary gastric melanoma with its meningeal metastasis. For this, the contents of a microscope slide were scraped and ultimately analyzed by nano-chromatography coupled online with tandem mass spectrometry using an Orbitrap XL. Our results disclose 184 proteins uniquely identified in the primary gastric melanoma, 304 in the meningeal metastasis, and 177 in common. Notably, we indentified several enzymes related to changes in the metabolism that are linked to producing energy by elevated rates of glycolysis in a process called the Warburg effect. Moreover, we show that our FFPE proteomic approach allowed identification of key biological markers such as the S100 protein that we further validated by immunohistochemistry for both, the primary and metastatic tumor samples. That said, we demonstrated a powerful strategy to retrospectively mine data for aiding in the understanding of metastasis, biomarker discovery, and ultimately, diseases. To our knowledge, these results disclose for the first time a comparison of the proteomic profiles of gastric melanoma and its corresponding meningeal metastasis.

Pinpointing differentially expressed domains in complex protein mixtures with the cloud service of PatternLab for Proteomics

Leprevost FV, Borges D, Crestani J, Perez-Riverol Y, Zanchin N, Barbosa VC, Carvalho PC.
Journal PaperJournal of Bioinformatics. 2013.

Abstract

Mass-spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. Here we describe a new module integrated into PatternLab for Proteomics that allows the pinpointing of differentially expressed domains. This is accomplished by inferring functional domains through our cloud service, using HMMER3 and Pfam remotely, and then mapping the quantitation values into domains for downstream analysis. In all, spotting which functional domains are changing when comparing biological states serves as a complementary approach to facilitate the understanding of a system's biology. We exemplify the new module's use by reanalyzing a previously published MudPIT dataset of Cryptococcus gattii cultivated under iron-depleted and replete conditions. We show how the differential analysis of functional domains can facilitate the interpretation of proteomic data by providing further valuable insight.

Effectively addressing complex proteomic search spaces with peptide spectrum matching

Lima DB, Perez-Riverol Y, Nogueira FCS, Domont GB, Noda J, Leprevost FV, Besada V, Franca FMG, Barbosa VC, Sanchez A, Carvalho PC.
Journal PaperJournal of Bioinformatics. 2013

Abstract

Protein identification by mass spectrometry is commonly accomplished using a peptide sequence matching search algorithm, whose sensitivity varies inversely with the size of the sequence database and the number of post-translational modifications considered. We present the Spectrum Identification Machine, a peptide sequence matching tool that capitalizes on the high-intensity b1-fragment ion of tandem mass spectra of peptides coupled in solution with phenylisotiocyanate to confidently sequence the first amino acid and ultimately reduce the search space. We demonstrate that in complex search spaces, a gain of some 120% in sensitivity can be achieved.

Computational Proteomics Pitfalls and Challenges: HavanaBioinfo 2012 Workshop Report

Perez-Riverol Y, Hermjabok H, Kohlbacher O, Martens L, Creasy D, Cox J, Leprevost, FV, Shan BP, Cabrera G, Padron G, Gonzales LJ, Besada V.
Journal Paper Journal of Proteomics. 2013.

Abstract

The workshop “Bioinformatics for Biotechnology Applications (HavanaBioinfo 2012)”, held December 8–11, 2012 in Havana, aimed at exploring new bioinformatics tools and approaches for large-scale proteomics, genomics and chemoinformatics. Major conclusions of the workshop include the following: (i) development of new applications and bioinformatics tools for proteomic repository analysis is crucial; current proteomic repositories contain enough data (spectra/identifications) that can be used to increase the annotations in protein databases and to generate new tools for protein identification; (ii) spectral libraries, de novo sequencing and database search tools should be combined to increase the number of protein identifications; (iii) protein probabilities and FDR are not yet sufficiently mature; (iv) computational proteomics software needs to become more intuitive; and at the same time appropriate education and training should be provided to help in the efficient exchange of knowledge between mass spectrometrists and experimental biologists and bioinformaticians in order to increase their bioinformatics background, especially statistics knowledge.

Software

  • C++ 2014

    Mmapper

    A tool for mapping peptides against omics data (under dev.)

  • C# 2013

    PepExplorer

    A similarity driven tool for analysing de novo sequencing results.

  • Java 2012

    TrypViewer

    An ultrafast desktop application for Trypanosomatid genomical retrieval.

  • Java 2012

    Selenide

    A distributed system for remote media management.

  • Perl 2011

    TrypanosOmics

    A web based system for data visualization for the T.cruzi Orfeome project.

  • Perl 2011

    Liquid

    A laboratory stock control focused on next-gen plataforms.

Modules & Libraries

  • Perl 2013

    Bio::DB::NextProt

    Object interface to NextProt REST API.

  • Perl 2013

    Math::SparseMatrix::Operations

    Mathematical operations with matrices.

  • Perl 2013

    AI::NeuralNet::Hopfield

    A Perl implementation of a Hopfield neural network.

  • Perl 2012

    Bio::Tools::Alignment::Overview

    A birds-eye viewer for large multiple alignments.

At My Office

You can find me at my office located in Curitiba, Paraná, Brazil.

I am at my office three days a week from 9:00 until 18:00 pm, but you may consider a call to fix an appointment.

At My Lab

You can find me at my Work located at Fiocruz, Curitiba, Paraná, Brazil.

I am at my office two days a week from 8:00 until 17:00 pm, but you may consider a call to fix an appointment.