bioinformatics project ideas

5 Machine Learning Projects in Bioinformatics For Practice

Explore Top Machine Learning Projects Ideas to Understand the Applications of Machine Learning in Bioinformatics| ProjectPro

The term "bioinformatics" represents the use of computation and analysis methods to collect and analyze biological data. It's a multidisciplinary field that combines genetics, biology, statistics, mathematics, and computer science. Various branches of bioinformatics, including genomics, proteomics, and microarrays, extensively use machine learning for better outcomes.

Personalized Medicine: Redefining Cancer Treatment

Downloadable solution code | Explanatory videos | Tech Support

Top 5 Machine Learning Projects in Bioinformatics

Here are five exciting machine learning projects for bioinformatics to help you understand the application of machine learning in healthcare , mainly bioinformatics.

1. Anti-Cancer Drug Efficacy Prediction

Predicting which patients are likely to benefit or not from a specific therapy is a significant concern in cancer treatment because, generally speaking, not all patients will benefit from a particular medication. This enhances the efficacy of treatment and reduces the suffering and misery experienced by non-responders. Thus, there is an immediate need to find reliable biomarkers (i.e., genes or proteins) that can precisely predict which patients respond best to which medications. For this project, you will use fundamental data science techniques , such as data processing, integration, analysis, and visualization, to determine the most effective biomarkers for various cancer types.

ProjectPro Free Projects on Big Data and Data Science

2. Autism Mutation Detection

In this machine learning project for bioinformatics, you will develop a deep-learning-based system that predicts the accurate regulatory effects and the harmful impacts of genetic variants to address the issue of detecting the impact of noncoding mutations on disease. This predictive genomics framework is likely relevant to complex human diseases, illustrates the significance of noncoding mutations in ASD [autism spectrum disorder], and identifies mutations with higher effects for further analysis. If you want to add some unique project to your machine learning portfolio , you must try working on this project.

Here's what valued users are saying about ProjectPro

Gautam Vermani

Data Consultant at Confidential

Director Data Analytics at EY / EY Tech

Not sure what you are looking for?

3. Personalized Cancer Medication

This deep learning project can predict how different genetic variations affect a patient's health. You can use the MSKCC (Memorial Sloan Kettering Cancer Center) database, including thousands of mutations that top-notch scientists and physicians have thoroughly classified. For this machine learning project, you will create a machine learning algorithm using the Keras deep learning library and LSTM that automatically categorizes genetic variants utilizing this data set as a starting point. Additionally, this project entails using various NLP text processing techniques such as Lemmatization, Stemming, Tokenization, etc.

You don't have to remember all the machine learning algorithms by heart because of amazing libraries in Python. Work on these Machine Learning Projects in Python with code to know more!

4. Human Disease Genetic Basis Identification

Human genomes vary between individuals by.1%. Our genetic inclination to specific disorders, such as hypertension, is encoded within this small degree of variation. We can accurately define which gene variants belong to each disease by comparing populations of healthy and diseased people and their variations in the genes responsible for the diseases. In this bioinformatics, AI and machine learning project, strategies for finding the variation corresponding to disease are developed, along with statistics to support the predictions. Furthermore, this project develops methods for predicting how a gene mutation can alter the structure of the protein or the regulatory structure. You can also estimate the disease risk factor's history and evolution by recreating the genes' phylogeny.

5. Build a DNA Sequence Classifier

You will use a classification model in this project that can predict a gene's function just from the DNA sequence of the coding sequence. You will create a function that will extract from any sequence string all overlapping k-mers of a given length, count the k-mers and convert the k-mers list for each gene into string sequences using scikit-learn NLP tools.

Access Solved Big Data and Data Science Projects

About the Author

Daivi is a highly skilled Technical Content Analyst with over a year of experience at ProjectPro. She is passionate about exploring various technology domains and enjoys staying up-to-date with industry trends and developments. Daivi is known for her excellent research skills and ability to distill

User policy

Write for ProjectPro

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Bioinformatics

Bioinformatics is an interdisciplinary field that intersects with biology, computer science, mathematics and statistics. It concerns itself with the development and use of methods and software tools for collecting and analyzing biological data.

Here are 9,185 public repositories matching this topic...

Developer-y / cs-video-courses.

List of Computer Science courses with video lectures.

Updated May 9, 2024

plotly / dash

Data Apps & Dashboards for Python. No JavaScript Required.

Updated May 15, 2024

biopython / biopython

Official git repository for Biopython (originally converted from CVS)

Updated May 13, 2024

google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Updated Mar 19, 2024

seandavi / awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

Updated Mar 27, 2024

danielecook / Awesome-Bioinformatics

A curated list of awesome Bioinformatics libraries and software.

Updated Apr 2, 2024

nextflow-io / nextflow

A DSL for data-driven computational pipelines

Updated May 17, 2024

OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)

Updated Apr 7, 2024

scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.

lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences

Updated May 18, 2024

allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

Updated Mar 30, 2024

broadinstitute / gatk

Official code repository for GATK versions 4 and up

bioconda / bioconda-recipes

Conda recipes for the bioconda channel.

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

Updated Apr 15, 2024

galaxyproject / galaxy

Data intensive science for everyone.

lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats

Updated Oct 24, 2023

soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite

Updated May 14, 2024

shenwei356 / seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

MultiQC / MultiQC

Aggregate results from bioinformatics analyses across many samples into a single report.

lightaime / deep_gcns_torch

Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

Updated Jul 31, 2022

Available Projects in Bioinformatics and Machine Learning

Discriminative graphical models for protein sequence analysis (joint project with sanjoy dasgupta), embedding sequences into euclidean spaces, discovering the genetic basis of human disease, statistical and algorithmic aspects of motif discovery, promoter discovery in drosophila, promoter modeling in bacteria and yeast, regulatory aspects of human disease.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals

Bioinformatics articles from across Nature Portfolio

Bioinformatics is a field of study that uses computation to extract knowledge from biological data. It includes the collection, storage, retrieval, manipulation and modelling of data for analysis, visualization or prediction through the development of algorithms and software.

Decoding cell replicational age from single-cell ATAC-seq data

The replicational age of single cells provides a temporal reference for tracking cell fate transition trajectories. The computational framework EpiTrace measures cell age using single-cell ATAC-seq data, specifically by considering chromatin accessibility at clock-like genomic loci, enabling the reconstruction of the history of developmental and pathological processes.

Annotating cell types in single-cell ATAC data via the guidance of the underlying DNA sequences

SANGO efficiently removed batch effects between the query and reference single-cell ATAC signals through the underlying genome sequences, to enable cell type assignment according to the reference data. The method achieved superior performance on diverse datasets and could detect unknown tumor cells, providing valuable functional biological signals.

Latest Research and Reviews

Invariant γδTCR natural killer-like effector T cells in the naked mole-rat

Naked mole-rats are long-lived rodents known to be resistant to the development of cancer, yet their immune system remains poorly explored. Here, the authors identify natural killer-like effector γδ T cells that express a dominant γδ T cell receptor and may serve a role in tumour immunosurveillance.

Guillem Sanchez Sanchez
Stephan Emmrich
David Vermijlen

Bioinformatics leading to conveniently accessible, helix enforcing, bicyclic ASX motif mimics (BAMMs)

Researchers mimic protein interface helices by stapling peptide side chains, or replacing hydrogen bonds with covalent ones, and synthetic helical mimics are heavily biased towards stapling. Here the authors describe bioinformatic discovery of hydrophobic triangles at helix N -termini, and rigid, bicyclic synthetic mimics of them.

Tianxiong Mi
Duyen Nguyen
Kevin Burgess

Using optical coherence tomography to assess luster of pearls: technique suitability and insights

Lifeng Zhou
Zhengwei Chen

Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder

This protocol is for using xMarkerFinder, a four-stage computational framework, to enable the identification and validation of reproducible microbial biomarkers from cross-cohort studies, and establish potential microbiome-induced mechanisms.

Wenxing Gao

Enzyme-assisted high throughput sequencing of an expanded genetic alphabet at single base resolution

The expansion of the genetic code with synthetic nucleotides has broadened our ability to evolve DNA as a functional material, but we lack analytical tools for the expanded alphabet. Here the authors demonstrate an enzyme-assisted method for the sequencing of six-letter DNA.

Kevin M. Bradley
Steven A. Benner

Stellae-123 gene expression signature improved risk stratification in taiwanese acute myeloid leukemia patients

Yu-Hung Wang
Adrián Mosquera Orgueira
Hwei-Fang Tien

News and Comment

Complement(ing) the microbiome in infants through breastmilk

Samuel P. Nobs
Eran Elinav

‘Wildly weird’ RNA bits discovered infesting the microbes in our guts

Rod-shaped structures named ‘obelisks’ are even smaller than viruses but can still transmit instructions to cells.

Saima Sidik

It’s me, hi, I solved the problem, it’s TF-seqFISH

Olivia Gautier
Aaron D. Gitler

UniBind: a novel artificial intelligence-based prediction model for SARS-CoV-2 infectivity and variant evolution

Jincun Zhao

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Director’s Welcome
Participating Departments
Frontiers in Computational Biosciences Seminar Series
Current Ph.D. Students
Current M.S. Students
Bioinformatics Department Handbook
B.I.G. Summer Institute
The Collaboratory
Diversity and Inclusiveness
Helpful Information for Current Students
Joint UCLA-USC Meeting
Student Blog and Twitter Feed
Social Gatherings
Introduction to the Program
Admissions Information
Admissions FAQs
Student Funding
Curriculum and Graduate Courses
Research Rotations
Qualifying Exams
Doctoral Dissertation
Student Publications
Capstone Project
Undergraduate Courses
Undergraduate and Masters Research
Bioinformatics Minor Course Requirements
Bioinformatics Minor FAQs
Bioinformatics Minor End-of-Year Celebration
For Engineering Students

General Information There are plenty of opportunities for Bioinformatics research projects at UCLA. This program is designed to help interested students find research projects related to Bioinformatics across campus. Typically, these projects are for credit; in exceptional circumstances they may offer funding. Participation in research projects can both significantly improve your chances of admittance into top graduate programs and make you a much more competitive employment candidate. Even better, it gives you something to talk about during an interview. Feel free to contact us even if you do not know exactly whether or not you want to work on a research project or know the field you wish to research in. Please remember that every undergraduate and masters student is welcome to participate in research, regardless of your background or year in the program. Undergraduates are STRONGLY encouraged to participate in research as early as possible in their careers. Ideally, you should start a research project during your sophomore year, but it is never too late or to early to start! Undergraduate students may receive up to 8 units credit toward the minor with enrollment in Computer Science 194/199 or Bioinformatics 194/199.

General Procedure If you are reasonably sure which project you would like to work on, use the contact information listed under the project to contact the person responsible for the project directly to set up a meeting. If you are not sure, but you are even slightly interested in research, feel free to email us or drop in to help chose an appropriate project. Most students take a project for course credit, although funding may be available in some cases. You can contact Eleazar Eskin (eeskin [at] cs [dot] ucla [dot] edu) if you have any questions.

Research Projects Below is a list of research projects that are accepting undergraduate researchers.

Featured News

Researchers awarded $4.7 million to study genomic variation in stem cell production, dr. nandita garud recognized for her research on gut microbiome, ucla study reveals how immune cells can be trained to fight infections, ucla scientists decode the ‘language’ of immune cells, dr. eran halperin elected as fellow of international society for computational biology, upcoming events, jana lipkova seminar, memorial day holiday, katherine andriole seminar, spring 2024 quarter instruction ends, spring 2024 quarter finals week, recent student publications.

RECENT STUDENT PUBLICATIONS LINK-PLEASE CLICK!

Updates Coming Soon!

College of Agricultural Sciences

Project Examples

In this section.

Bioinformatics

Here are some examples of Bioinformatic analyses we have expertise in conducting.

We have experience working with many diverse data and organism types, so even if your topic is not listed in our project examples, we are likely to be able to assist you.

Deliverables for Basic/Standard Analysis

MAX Turnaround time – 2 months depending on application and sample size

1. Whole Genome Sequencing

Prokaryotes.

RE-SEQUENCING: Raw Data QC and Report, Alignment Statistics and Report, Variation Calling Report (SNP, InDels), Gene Annotation Table with Variations.
DENOVO: Raw Data QC and Report, Assembly Statistics and Report, Genome Finishing using Closest homolog, rRNA identification and analysis report, Phage Identification and analysis report, Plasmid Identification, and analysis report, RAST Annotation.
RE-SEQUENCING/TARGETED/EXOME: Raw Data QC and Report, Alignment Report, Variation calling Report (SNP, InDels), Basic Variation Annotation, and Effect Analysis Report.
DENOVO: Raw Data QC and Report, Assembly Statistics and Report, Gene Prediction and Annotation Report. Data generation depends on predicted genome size

2. Transcriptome Sequencing

RE-SEQUENCING: Ribo Depletion (rRNA Depletion) – Raw Data QC and Report, Read Alignment to reference genome and transcript Identification, Comprehensive Transcript Annotation, Functional Classification of Annotated Transcript, Expression Profiling, Quantification & Expression Profiling of transcripts, Differential Analysis among the conditions, Biological Significance Analysis of differentials.
RE-SEQUENCING: – Raw Data QC and Report, Read Alignment to reference genome and transcript Identification, Quantification & Expression Profiling of transcripts, Differential, Analysis among the conditions, Biological Significance Analysis of differentials (n-1). All pictorial representations of comparisons will be according to n-1
DENOVO: Raw Data QC and Report, De novo assembly, Assembly Evaluation & Filtering, Sequence homology-based Transcript Annotation using Blast2Go – REFSEQ, Expression Profiling, Differential Analysis among the conditions, Biological Significance Analysis of differentials (n-1). ALL pictorial representation LL of comparisons will be according to n-1.

3. Chip Sequencing

Raw Data QC and Report, Alignment Report, Peak Identification, and Enrichment Report, Peak Annotation Report

4. Metagenome Sequencing

Sample Grouping or individual as per experimental design, Group-wise OTU Clustering and abundance Report, OTU identification and taxonomic annotation Report (Sample Wise – Genius Level) and OTU Fasta file will be provided, Pie chart representation TOP 10 taxonomic classification; phylum to species-level.

5. SmallRNA Sequencing

Sample wise Raw Data QC, Unique tags and abundance Report, Known Small RNA analysis report, Identification and Quantitation of Known miRNAs, Expression Profiling and Differential Expression Analysis of Known miRNAs.

6. Microbiome Sequencing

Pre-processing of reads including Quality Filtering, trimming low-quality reads, De-Replication, Sequence reconstruction and grouping, Gene prediction, Functional Annotation.

Deliverables for Advanced Analysis

MAX TAT – 3 months depending on the project requirement and sample size

RE-SEQUENCING: Raw Data QC and Report, Alignment Statistics and Report, Variation Calling Report (SNP, InDels), Gene Annotation Table with Variations, Structural Variations (Inversion, Deletion, Insertion, Translocation, Transversion) analysis report, Comparative Genome analysis – Across selected genomes, High SNP and Low SNP Region, Generic and NonGeneic SNPs, SNP Density Analysis, Synonymous and Non-synonymous SNPs, Effect of Frameshift Indels on Gene Prediction, Submitting Data to NCBI -SRA, Support in providing write up on methods for the manuscript purpose (Time Limit: 3-6 month)
DENOVO: Raw Data QC and Report, Assembly Statistics and Report, Genome Finishing using Closest homolog, rRNA identification and analysis report, Phage Identification and analysis report, Plasmid Identification and analysis report, Phylogeny 16s RNA based, COG Analysis, Interproscan Analysis, AAI and ANI analysis with the selected reference genome, Antibiotic resistance gene analysis with reference to transposable elements, PAN and Core genome analysis, Synteny Analysis, Chromosome Mapping, Plasmid Re-construction from whole-genome, Submitting Data to NCBI- SRA, Support in providing write up on methods for the manuscript purpose (Time Limit: 3-6 month)
RE-SEQUENCING/TARGETED/EXOME: Raw Data QC and Report, Alignment Report, Variation calling Report (SNP, InDels), Basic Variation Annotation and Effect Analysis Report, All the deliverables from Standard Analysis, Structural Variation Analysis Report, Variation Effect Analysis Report, Pathway and GO analysis of variations, Copy Number Variation Analysis, Data Submission to NCBI, Comparative Exome Analysis, Submitting Data to NCBI- SRA, Support in providing write up on methods for the manuscript purpose.
DENOVO: Raw Data QC and Report, Assembly Statistics and Report, Gene Prediction and Annotation Report, Prediction of rRNAs, tRNAs, Repeat Analysis, Identification of Transposons, Domain Identification, Analysis of Virulence genes, Analysis of CaZymes, Synteny Analysis, Comparative Exome Analysis, Submitting Data to NCBI- SRA, Support in providing write up on methods for the manuscript purpose.
RE-SEQUENCING: Ribo Depletion (rRNA Depletion) – Raw Data QC and Report, Read Alignment to reference genome and transcript Identification, Comprehensive Transcript Annotation, Functional Classification of Annotated Transcript, Expression Profiling, Quantification & Expression Profiling of transcripts, Differential Analysis among the conditions, Biological Significance Analysis of differentials, Inter and Intra Gene List Comparisons, Gene and Pathway enrichment analysis, GO and Pathways based Gene Regulatory Network Modelling, Submitting Data to NCBI- SRA, Support in providing write up on methods for the manuscript purpose.
RE-SEQUENCING: Raw Data QC and Report, Read Alignment to reference genome and transcript Identification, Expression Profiling, Quantification & Expression Profiling of transcripts, Differential Analysis among the conditions, Biological Significance Analysis of differentials, Inter and Intra Gene List Comparisons, Gene and Pathway enrichment analysis, GO and Pathways based Gene Regulatory Network Modeling, Functional classification of expressed transcripts Submitting, Data to NCBI-SRA, Support in providing write up on methods for the manuscript purpose.
DENOVO: Raw Data QC and Report, De novo assembly, Assembly Evaluation & Filtering, Sequence homology-based Transcript Annotation using Blast2Go – NRDB, Expression Profiling, Differential Analysis among the conditions, Biological Significance Analysis of differentials, Sequence homology-based Transcript Annotation against the customized database, Inter and Intra Gene List Comparisons, Gene and Pathway enrichment analysis, Functional Classification of Annotated Transcript, GO and Pathways based Gene Regulatory Network Modeling, Data to NCBI-SRA, Support in providing write up on methods for the manuscript purpose.

Raw Data QC and Report, Alignment Report, Peak Identification, and Enrichment Report, Peak Annotation Report, Motif Identification, Statistical analysis of Peak Reproducibility (If replicates are provided), Significant GO and Pathway Analysis, Data to NCBI-SRA, Support in providing write up on methods for the manuscript purpose.

Sample Grouping/Individual (either one) as per experimental design, Group-wise OTU Clustering and abundance Report, OTU identification and taxonomic annotation Report (Sample Wise – Genius Level) and OTU Fasta file will be provided, Pie chart representation TOP 10 taxonomic classification (Phylum to Species-level), Differential Metagenome based on sample conditions, Diversity Analysis (Alpha and Beta), Rarefaction Curves, PCoA Plot (required minimum six samples), Krona Plot at the genus level, Heat-Maps for comparisons, Species-level annotation (If V3 & V4 is covered), Data to NCBI-SRA, Support in providing write up on methods for the manuscript purpose.

Raw Data QC and Report, Known Small RNA analysis report, Identification and Quantitation of Known miRNAs, Expression Profiling and Differential Expression Analysis of Known miRNAs, Novel miRNA Identification (In case of reference genome availability) and analysis report, Characterization of other small RNAs like siRNA, piRNA, snoRNA, miRNA Target Prediction / Identification, Significant GO and Pathway Analysis of targets of differentially expressed miRNAs, DData to NCBI-SRA, Support in providing write up on methods for the manuscript purpose.

Pre-processing of reads including Quality Filtering, Trimming low quality reads, De-Replication, Sequence reconstruction and grouping, Gene and regulatory element prediction, Functional Annotation, Differential Microbiome based on sample parameters, Statistical analysis of Microbiome based on OTUs, Diversity Analysis (Alpha and Beta), Rarefaction Curves, Species-level annotation, Seed Subsystem classification, COG, KEGG Analysis, Gene Ontology and Pathway Analysis (Functional Microbiome Analysis), Data to NCBI-SRA, Support in providing write up on methods for the manuscript purpose.

Loading metrics

Open Access

Bioinformatics Projects Supporting Life-Sciences Learning in High Schools

Affiliation Instituto Gulbenkian de Ciência, Oeiras, Portugal

Affiliation Escola Secundária Stuart de Carvalhais, Queluz, Portugal

* E-mail: [email protected]

Isabel Marques,
Paulo Almeida,
Renato Alves,
Maria João Dias,
Ana Godinho,
José B. Pereira-Leal

Published: January 23, 2014

https://doi.org/10.1371/journal.pcbi.1003404
Reader Comments

The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called “Bioinformatics@school.” It includes web-based research projects that students can pursue alone or under teacher supervision and a teacher training program. The project is organized so as to enable discussion of key results between students and teachers. After successful trials in two high schools, as measured by questionnaires, interviews, and assessment of knowledge acquisition, the project is expanding by the action of the teachers involved, who are helping us develop more content and are recruiting more teachers and schools.

Citation: Marques I, Almeida P, Alves R, Dias MJ, Godinho A, Pereira-Leal JB (2014) Bioinformatics Projects Supporting Life-Sciences Learning in High Schools. PLoS Comput Biol 10(1): e1003404. https://doi.org/10.1371/journal.pcbi.1003404

Editor: Fran Lewitter, Whitehead Institute, United States of America

Copyright: © 2014 Marques et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was funded by the Instituto Gulbenkian de Ciência. The funders had no role in the preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Background and Motivation

Our lives are increasingly touched by science and technology, from the everyday activities of browsing the internet, taking a prescription drug, etc., to major societal discussions involving, for example, genetically modified foods, cloning, or stem cells. It is therefore imperative that we engage young people in science. We witnessed in the past shrinking numbers of students choosing science degrees for their university education [1] . This trend seems, however, to have been inverted both in Europe and in the United States [2] , [3] . A recent study points to the development of new and more attractive curricula and teaching methods as the driver for this increased interest [3] . In light of the growing evidence of a direct link between attitudes towards science and the way science is taught [1] , there is increasing recognition of the need to couple the traditional teacher-centred “deductive approach” to the learner-centred “inductive approach,” relying on observation, experimentation, and teacher guidance in constructing students' knowledge. This “bottom-up” approach, called enquiry-based learning (also known as problem-based learning or case-based learning) [4] recapitulates the scientific process (raising questions, collecting data, reasoning, reviewing evidence, drawing conclusions, and discussing results), thus promoting both ideas of science (scientific concepts) and ideas about science (process, practices, and critical thinking), i.e., about the Nature of Science (NOS).

Bioinformatics is a discipline at the intersection of biology, computer science, information science, mathematics, and to some extent also of chemistry and physics. It developed in response to the increasingly complex data types and relationships in biological research, addressing the need to manage and interpret biological information. This interdisciplinary nature makes bioinformatics an ideal framework to engage high school students, as it illustrates the interplay between different scientific areas, while touching on many aspects that are relevant to the younger generations—health, environment, etc. This has been recognized by many others who have implemented bioinformatics-training programs. Examples are a web-based, problem-oriented approach aimed at introducing students to bioinformatics [5] and the use of bioinformatics activities as a way to teach evolution [6] or notions of polymorphisms in the context of human genetic variation and disease [7] . Bioinformatics has also integrated with wet-lab activities in initiatives like the student-aimed “Cus-Mi-Bio” project [8] , which include gene finding activities, or in projects aimed at high school and college teachers, such as the ones at the Dolan DNA learning centre of Cold Spring Harbor Laboratory involving plant genome annotation [9] . More recently, activities that aim to introduce high school students to bioinformatics itself have also been reported [10] , and, as of 2012, an exercise using Basic Local Alignment Search Tool (BLAST) has been included on the Advanced Placement, high school biology, national test in the US ( http://apcentral.collegeboard.com/apc/members/courses/teachers_corner/218954.html ). Note, however, that these are likely isolated cases rather than the norm, as a survey revealed that in 2008 bioinformatics was still absent from the classroom in the US [11] , and likely elsewhere.

The “Bioinformatics@school” Program

We run a Bioinformatics Core at the Instituto Gulbenkian de Ciência, in Portugal, that has long been engaged in outreach activities. In 2007, we decided to implement a genomics/bioinformatics activity that would enable enquiry-based learning; link to the national curricula in biology in secondary education; introduce students to bioinformatics, genomics, and molecular biology, areas that underlie many of the key debates and products in our societies; foster active learning, making use of technologies that younger generations are increasingly comfortable with; and help teachers incorporate the latest advances in science into their teaching. We developed a prototype system that we describe the following components of here: its development, implementation, and the results of nearly five years of activity.

We developed and implemented a framework for the use of bioinformatics-based research projects in high schools to support the life-sciences curricula, which we named, in Portuguese, “Bioinformática na escola,” loosely translating to “Bioinformatics@school.” It consists of research projects that may be conducted independently by high school students of different ages, either under direct teacher supervision or as homework. Each work unit in a research project is designed to be carried out in 90 minutes, which is a standard class length in Portuguese high schools. We implemented it as a web portal ( Figure 1A, 1B )— www.bioinformatica-na-escola.org . Although primarily written in Portuguese, the site makes use of external, freely accessible bioinformatics tools and databases available in English. This is not a problem for Portuguese high school students that typically start learning English at the age of nine. Because of the dependency on external sites, we have ensured that students are given alternative access to any data on which progression to the following activity depends.

PPT PowerPoint slide
PNG larger image
TIFF original image

( A ) Screenshots of the home page. ( B ) Screenshots of exercises pages.

https://doi.org/10.1371/journal.pcbi.1003404.g001

The whole program is structured as a set of projects with open-ended questions. A project may have a single activity or several, each having focused questions. Answering these focused questions enables students to discuss and/or solve the project's main question. Individual activities in the multi-activity projects were designed to also be used independently (discussed below). The concept lends itself both to classroom use, individually or in pairs, or as homework. We designed individual activities to explore specific concepts that are part of the school curriculum and the projects to be coherent with the curriculum of specific age groups, with the active collaboration of teachers in choosing the topics.

Projects are organized as follows. Once a project is selected, the student has access to a page that summarizes the problem to be solved and a link to the first activity. As the student enters one activity, s/he is presented with a sequential series of pages, each giving some background information on the specific problem the student has to follow and a brief description of the bioinformatics resources/tools to be used. At the end of each activity, the student is taken to a summary page (“now you know that…”) with an overview of the basic concepts that were addressed in the activity. All pages include links to additional information on key concepts, mostly on Wikipedia ( www.wikipedia.org ), including explanations about the resources and algorithms used in the analysis. Once the activity/final activity is complete, the student is taken to a summary page that reviews the key concepts of the project as a whole and a series of questions that act as primers for discussion amongst students and with the teacher(s) (see Figure 1A, 1B for screenshots). Table 1 summarizes the questions, concepts, and software and resources that are covered in each individual activity of “Vision,” the first multi-activity project that we have implemented in the Bioinformatics@school portal (further detailed in Text S1 ). Its implementation in schools is discussed below.

https://doi.org/10.1371/journal.pcbi.1003404.t001

Implementing “Bioinformatics@school”

Iterative development of project modules.

We started to develop “Bioinformatics@school” as a pilot project in 2007, in close collaboration with high school students and teachers. The first stage of the project consisted of identifying the topics within the high school curricula that would be amenable to bioinformatics treatment, as well as the ideal school year for the pilot to be developed. We chose 12th grade biology, in the last year of high school in the Portuguese educational system, as their curricula included multiple themes that were ideal to address using bioinformatics (such as genes, genomes, genetics, evolution, mutation, etc.), and these students would all have had several years of English language schooling (discussed above). The next phase of the project consisted in the enrollment of schools. Two secondary schools located in the Lisbon area were recruited, representing two different demographics. Escola Secundária Miguel Torga (ESMT), in Queluz, is a large suburban school that covers a variety of social strata, while Escola Secundária Quinta do Marquês (ESQM), in Oeiras, is located in a high income area with high levels of graduates and post-graduates. We engaged seven 12th grade Biology teachers, two from ESMQ and five from ESMT. One hundred and fifty students were involved in this initial pilot phase, representing multiple science-related career ambitions, ranging from engineering, health, biology, psychology, sports, etc.

We conceived the general framework described above and developed the first full project consisting of five activities, aimed at understanding how animals see different colours (“Vision” project, Table 1 ). We chose this question because we believed it to be sufficiently intriguing and relevant to engage the students (natural variants cause differential colour perception between species and between different people), but also for practical reasons—the biology of light detection via opsins is well understood, as is the 3D structure of opsins. The aim of the project is to motivate a discussion about evolution, molecular mechanisms, and disease, all inferred from bioinformatics analysis, while helping teachers and students engage with specific topics of the Life Sciences curriculum via the individual bioinformatics activities.

An innovative aspect of this project was the collaboration between scientists, teachers, and students on different aspects of the development, implementation, and testing—a three-way dialogue with continual updating in response to feedback of students and teachers. The development was iterative, first within our Bioinformatics Unit, and then in discussions with teachers. Once a first prototype was in place, one of us (IM) went to the schools to guide the students in the first activities of the project, with the help of the teacher. Student feedback was then used to improve the activities, in terms of rationale, language, and presentation.

Teacher training

Keeping up to date with the rapid developments in genomics and bioinformatics represents a challenge for high school teachers, particularly when many may have completed their training decades ago. In fact, in our experience, bioinformatics is a novel subject area to most Portuguese high school teachers. This led us to implement a parallel teacher-training program, again co-developed with the first set of teachers. Teachers were trained by bioinformatics experts, with the main goal of training to guide the students in the bioinformatics-based projects and to understand the basics of the bioinformatics methods and resources underlying each activity. We developed a teacher's manual that described the activities step by step and provided additional background information for the teacher to be comfortable with all the concepts in each activity. The teacher training consisted of having the teachers follow the same activities as the students, with the help of the teacher manual and under the supervision of a bioinformatician. We have expanded the teacher training to include seminars about applications of bioinformatics to human health, biotechnology, etc. A typical teacher training program lasts about 25 hours.

Extending the Program and Sustainability

After the successful pilot stage in 2007 the project has expanded to other geographical areas of Portugal. Thirty three new schools have joined the program, some via previously engaged teachers who took the program with them when they moved to a new school, others by new teachers who contacted us, after hearing about the project, and asked us to help them implement it in their schools. In total, schools of 11 municipalities in four Portuguese districts are currently following the program ( Figure 2A ). On their own initiative, some teachers have adapted the individual activities within the “Vision” project for use with younger students. They have also picked individual or subsets of activities and re-used them with different genes/systems, combining them in novel ways, to create new projects. They have also engaged with us to develop new projects (“Tasting Bitter”) and activities (“Tree of Life”). Furthermore, teachers are recruiting and training new teachers to use our activities. Interestingly, we observed that teachers tailored the activities to their own teaching style, some engaging the students almost at every mouse click, whereas others would only focus on explaining the basic ideas at the beginning and then discussing the outcomes at the end.

( A ) Map of schools participating, coloured by year of joining the project. ( B ) Summary of responses to confidential questionnaire. ( C ) Knowledge acquisition—each dot represents one class and the average score that students in that class achieved in the test before and after finishing the “Vision” project. ( D ) Confidence—each dot represents one class and the percent of answers that students in that class answered True or False, as opposed to answering “I don't know,” before and after finishing the “Vision” project.

https://doi.org/10.1371/journal.pcbi.1003404.g002

One aspect that worried us from early on was how to motivate teachers to engage with projects like ours when they are overwhelmed with teaching and administrative work. We realized that certification of the training is important for career progression within the Portuguese public educational system. We invested in having the project certified for teachers' continuous professional development by the national educational authorities (Conselho Científico-Pedagógico da Formação Contínua), thus making engagement with Bioinformatics@school even more appealing to the teachers. Recently we established a partnership with a teacher training centre (Centro de Formação Lezíria - Oeste) to enable other teachers in another Portuguese region to receive training in Bioinformatics activities and further promote the decentralization of “Bioinformatics@school.”

We have, thus, reason to believe that the use of the Bioinformatics@school platform is spreading on its own, with a dynamic beyond the ability of the small staff at the Bioinformatics Core that developed it.

Impact Assessment

We wished to evaluate how students and teachers perceive the program and to what extent it is an effective learning tool. These are independent questions that we addressed using different approaches. Conversations with students participating in the program suggested that they were motivated to participate in “hands-on” activities We implemented a simple confidential questionnaire to capture students' views beyond anecdotal opinions, that was given to 150 students (two schools, seven classes), during the implementation phase of the project. The results are shown in Figure 2B and reveal that the majority of the students found the approach used in this project more motivating than traditional teaching methods (58%), and enjoyed participating in it (60%). About 80% considered it had not been a waste of time and 80% would recommend the project to next year's colleagues. This type of questionnaire is useful in gauging attitudes towards the program, but it has caveats, namely that the students at this stage were very involved with the development of the Bioinformatics@school project and may be overly positive because of that. In addition, it gives no information about student learning. To address this, we devised a simple test on the concepts explored in the program, with “True/False/I Don't Know” answers ( Table S1 ). We asked four classrooms to take the test before and after the activities (this test was irrelevant for their grades). Plotting the percentage of correct answers per student before and after the activities ( Figure 2C ) revealed a dramatic increase in the proportion of correct answers, indicating that students actually gain knowledge. One surprising result was that the students appeared more confident after doing the activities: they increasingly answered the test questions as false or true, rarely using “I don't know” ( Figure 2D ). Since most of the concepts in our activities are part of the school curricula and were being covered in class by their teachers, we speculate that the decrease in “I don't know” answers may indicate that students are less afraid of venturing answers to scientific questions after doing the activities. Fear of science (“too complicated”) has been pointed out as a reason for the decreasing number of students pursuing scientific degrees [1] . This is an exciting finding that we will need to specifically evaluate further in the future. Regarding the teachers, we developed the whole program in close collaboration with them and obtained continuous feedback on the content and presentation. Although we have not as yet conducted a systematic evaluation of teachers' views about the program, the continuous contact with the currently more than seventy teachers involved suggests to us that this is a useful teaching/learning tool. In particular, teachers mention that these activities allow them to overcome the lack of laboratory-based practicals associated with some of the content in the curricula, like genetics and molecular biology. The fact that the program is spreading, with new teachers and schools recruited by word of mouth by the teachers themselves, underscores its interest and usefulness to teachers.

Discussion and Future Directions

In summary, we implemented a set of bioinformatics multi-activity research projects designed to enable enquiry-based learning in high schools. Assessment of this project has shown that students find it enjoyable and teachers believe it to be useful as a teaching aid. Objective assessment of knowledge acquisition revealed a clear positive effect both in knowledge and confidence of the students. Teachers have taken the initiative to adapt the activities to their own teaching settings and are also recruiting other teachers, which gives us further confidence in the usefulness of this project.

We have focused the projects on addressing specific biological questions, to serve the Life Sciences curriculum. This means that we don't explore the algorithmic or technological side of bioinformatics. For the future, we hope to engage teachers from mathematics, information technology, physics, and chemistry to develop projects that can serve the curricula of those particular subjects.

Recently, Form and Lewitter proposed a simple set of ten rules to guide the use of bioinformatics in high schools [12] . While these were not available at the time we were developing this project, it is interesting to note that we independently “discovered” several of these principles. We implemented individual activities with clear, simple goals (rule 1) that built on each other (rule 4), enabling students to “discover” concepts on their own (rules 5 and 8). Throughout this project we were always mindful that these activities need to serve the pre-existing curricula (Rule 3). In the future we would like to have multiple projects serving the same concepts that would allow students in each class to choose an individual project (rule 6: personalization) that they could then present and contrast to other projects pursued by their colleagues (rule 10: produce a product). We would like to develop a mapping of activities to concepts in the curricula, so that it becomes even easier for teachers to mix and match the individual activities to different contexts, thus using our project as a means to empower the teachers. Based on our experience in setting up this program, we would like to suggest two additional “simple rules” that we believe to be important when developing contents to be used in high schools:

Engage teachers and students in the development of the activities, as a means of empowering them and ensuring that the end product meets all the cognitive and pedagogical requirements (e.g., engage the teachers in choosing the specific topics of the curricula that would benefit from bioinformatics-based projects as well as to advise on time or practical constraints on their use in the school setting; engage both teachers and students to identify weak/unappealing points in the contents and formats of the activities and to suggest better solutions, etc.).
Evaluate the impact of the activities on engagement/enthusiasm for science and, in particular, on knowledge acquisition, as demonstrated effectiveness is the best way to get bioinformatics into the classroom. In our opinion, perpetuating useless activities just for the sake of their perceived modernity is more likely to harm the use of bioinformatics as a tool for high school science education than to advance it.

Our program was developed in Portuguese as it is targeted at Portuguese students. While this gives us potential access to a universe of more than 200 million Portuguese speakers worldwide, it is hard to use by speakers of other languages. We have started translating the whole set of activities into English, thus making Bioinformatics@school accessible to a much larger target audience. Equally, besides developing novel activities, we would like to adapt those from successful experiments elsewhere, and in due time will contact their authors directly. In this regard, the existence of a central repository of bioinformatics exercises to be used in high schools, with clear explanations according to pre-defined standards and mapping to specific concepts, would facilitate the adoption of bioinformatics in high schools. Developing standards and repositories should come naturally to the bioinformatics community!

Supporting Information

Questionnaire for impact assessment.

https://doi.org/10.1371/journal.pcbi.1003404.s001

Activities in the “Vision” project.

https://doi.org/10.1371/journal.pcbi.1003404.s002

Acknowledgments

We wish to thank all the high school teachers who have engaged with the Bioinformatics@school project, in particular Lurdes Louro (ESMT, Queluz), Filomena Delgado (ESQM, Oeiras), and Teresa Palma (Escola Secundária de Camões [ESdeC], Lisboa). We wish also to thank for their generosity and enthusiasm the initial batch of students from ESQM and ESMT who helped us develop ever better activities. Finally, we thank João Garcia and Gil Neto at the IGC, who provided invaluable IT support. We also wish to thank the Instituto Gulbenkian de Ciência for hosting this program.

View Article
Google Scholar
2. Kang K (2012) Graduate Enrollment in Science and Engineering Grew Substantially in the Past Decade but Slowed in 2010. National Center for Science and Engineering Studies. Available: http://www.nsf.gov/statistics/infbrief/nsf12317/ . Accessed 20 December 2013.
3. Kearney C (2010) Efforts to Increase Students' Interest in Pursuing Mathematics, Science and Technology Studies and Careers. Wastiau P, Gras-Velázquez A, Grečnerová B, Baptista R, editors. Brussels: European Schoolnet. Available: http://cms.eun.org/shared/data/pdf/spice_kearney_mst_report_nov2010.pdf . Accessed 20 December 2013.

Translational Informatics

Welcome to the future

ISBDS: Project ideas and templates

Independent Study in Biomedical Informatics (ISBDS)

This document provides ideas for research projects, and links to research plan templates, which are partially completed plans. Template files are available via the ISBDS course GitHub repository . For ISBDS, a research plan template can vary within biomedical science topics, but definitely includes a specific data source, overall problem statement, and methodological approach. Students will be required to complete the template to comprise a preliminary research plan for approval prior to registration. Advisors are invited to contribute research plan templates in their areas of interest and expertise, which may be based on the generic ISBDS Research Plan Template .

General Suggestions

Review and analysis of an important public dataset.
Review and analysis of an important public informatics tool.
Reproducing and extending a published analysis.
Building a database from public sources for a biomedical topic of interest.
Adapt approaches, projects, and learning objectives from an existing, MOOC or other online course (e.g. Coursera , edX , Johns Hopkins , Indiana , Stanford , Hasso Plattner ), with or without completing the course.
Respond to an online data science challenge (e.g. Kaggle ).
Building an online app for researchers, clinicians, or patients.
Create or improve an open source software package.

Bioinformatics

Network Analysis in Systems Biology (coursera.org)
Target Illumination GWAS Analytics (TIGA); see paper and repository .
Knowledge Graph Analytics Platform (KGAP); see paper and repository .
STRING: functional protein association networks
Systems Biology; Metabolic engineering for synthetic biology.
Structure to function.
GTEx Portal
Sequence alignment.

Cheminformatics

PubChem analysis, descriptive or predictive
ChEMBL analysis, descriptive or predictive
DrugCentral analysis, descriptive or predictive
Badapple analysis, descriptive or predictive

Drug Discovery

Bioactivity prediction by machine learning (see https://predictor.ncats.io/ , https://atomscience.org/ , https://drugcentral.org/Redial , https://deepchem.io/ ).
TEMPLATE: Homology Modeling (adapted from Intro to Biocomputing Unit 2 Assignment 1 and Assignment 2 )
TEMPLATE: Virtual Screening (adapted from Intro to Biocomputing Unit 3 Assignment 1 and Assignment 2 )
Chemical Predictive Modeling (Abhik Seal).
Knime for Cheminformatics (Abhik Seal).

Medical Informatics

OHDSI (Observational Health Data Sciences and Informatics): replicate, vary or extend published studies .
Open Medical Record System (OpenMRS) The global OpenMRS community works together to build the world’s leading open source enterprise electronic medical record system platform. https://wiki.openmrs.org/
Clinical Data Analysis in R (Abhik Seal)

Computational modeling

Tumor modeling
Bacterial infection modeling
Blood Sugar Regulation
Viral transmission modeling

Public Health & Epidemiology

Public Health: Big Cities Health Coalition (BCHC) and Big Cities Health Inventory (BCHI)
Healthcare Cost and Utilization: HCUP-US Databases
HealthData.gov
SEER-Medicare Health Outcomes Survey (SEER-MHOS) Linked Data Resource Surveillance, Epidemiology & End Results.
Medicare Provider Utilization and Payment Data https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html
CORD-19 , COVID-19 Open Research Dataset (CORD).
WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data .
Johns Hopkins Coronavirus Resource Center .
EMBL-EBI COVID-19 Data Portal

Fitness, Wellness, & Health

The Open Artificial Pancreas System project OpenAPS.org is an open and transparent effort to make safe and effective basic Artificial Pancreas System (APS) technology widely available to more quickly improve and save as many lives as possible and reduce the burden of Type 1 diabetes. OpenAPS means basic overnight closed loop APS technology is more widely available to anyone with compatible medical devices who is willing to build their own system .
ResearchKit & CareKit from Apple. CareKit allows developers to build apps that leverage a variety of customizable modules. CareKit apps will let users regularly track care plans, monitor their progress, and share their insights with care teams. CareKit is open source, developers can build upon existing modules and contribute new code to help users world wide create a bigger—and better—picture of their health.

Natural language processing (NLP) and text mining

PubMed named entity recognition (NER); see JensenLab Tools including Tagger .
Twitter sentiment analysis
Clustering by topic modeling
See code and projects from Jason Timm ,

Databases and datasets

MHEALTH Dataset Data Set body motion and vital signs.
Kaggle (over 50,000 public datasets and 400,000 public notebooks),.
Aggregate Analysis of ClincalTrials.gov (AACT) Database | Clinical Trials Transformation Initiative ,
Hetionet – An integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more. The network combines over 50 years of biomedical information into a single resource, consisting of 47,031 nodes (11 types) and 2,250,197 relationships (24 types).
ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways): Robokop is a biomedical reasoning system that interacts with many biomedical knowledge sources to answer questions. Robokop is one of several prototype systems under active development with NIH NCATS.
Drug Central , online drug compendium.
Illuminating the Druggable Genome (IDG) : Pharos and Target Central Resource Database (TCRD) .
The openFDA FDA Adverse Event Reporting System (FAERS) is a database that contains information on adverse event and medication error reports submitted to FDA.
New Mexico Decedent Image Database
Embase , a highly versatile, multipurpose and up-to-date biomedical research and literature database.

Bioinformatics Project Ideas/Topics Collection For Engineering Students

Predicting Cellular Localization . Eukaryotic cells contain several sub-compartments, the Cellular Localization problem consists of predicting which compartment a protein is most likely to be found, on the basis of sequence information alone. The project may consist of a review of the literature and/or a novel analysis (I have access to a data-set that has never been used in a predictive context).
Regulatory-motifs . Review of the literature on algorithms to automatically determine regulatory motifs (short sequence signals) in DNA sequence data. I have a Java library that can be used to implement a prototype application; see suffix tree below.
SNP (Single Nucleotide Polymorphism) . Review the literature of the methods for detecting SNPs, as well as their application. Single nucleotide polymorphisms (SNPs) are common DNA sequence variations among individuals. They promise to significantly advance our ability to understand and treat human disease. (Excerpt fromsnp.cshl.org). See also Linkage analysis. (S)
Metabolic Pathways . Proteins interact together to perform specific functions. Such network of interaction is called a molecular pathway. There are two main aspects to this field: how to infer/determine the connections and how to simulate cellular processes. There exist several computational approaches to model molecular pathways, including Petri-net.
Molecular -arrays . Todays technology (which borrows from inkjet technology) allows to fix tens of thousands of different macromolecules (DNA or protein molecules) onto a small surface. This technology allows to reveal which macromolecule is expressed, at different times, within different tissues, or different cellular states (disease vs non-disease). In the case of DNA chips, they measure the levels of expression of each gene.
Mass spectrometry (MS) . MS produces a spectrum of all the masses of all the compounds that are present in a sample. When an input protein is cut at specific sites, it will produce a specific spectrum. Such technology can now be used to fingerprint the content of a cell.
Expression data + motif discovery . DNA--arrays allows to find genes that are simultaneously expressed. Those genes are most likely co-regulated, i.e. they share a common sequence signal in their promoter region. Daniela Cerna implemented a suffix tree library in Java, in the context of her honours project. Here, we would be re-using the library to help finding conserved motifs.
Expression data + cell localization . Can the use of (predicted and experimental) data on cellular localization help distinguish between true and false positive when expression data is analyzed to find actors and inhibitors?
Genome comparison . Implementing a MUMMER-like algorithm using Danielas suffix tree (Java) library. This involves writing a hybrid algorithm k-bands dynamic programming algorithm + suffix trees.
Genome rearrangements . Genomes are evolving at several scales: from point mutations to large rearrangements. It the late 80s, it became evident that several closely related genomes had genes that were extremely similar (say 99 pid), one to another, but the order of genes along the chromosomes was not preserved. Review and present the main algorithms to compare entire genomes. Topics include: sorting by reversals (Sankoff), break point graph, Hannenhalli and Pevzner algorithm.
Accurate Phylogenetic Reconstruction from Gene-Order Data.
Ontologies . What is an ontology? What tools and knowledge representation formalisms (languages) are available to support the development of ontologies. Give examples of ontologies. Expose the problems associated with ontologies.An ontology is a controlled vocabulary (e.g. gene ontology). It allows to resolve some of the problems associated with data integration.
Genome assembly . Because of physical limitations, only relatively short DNA sequences can be read (some 500nt). For processing a complete genome, one approach, called shut-gun, consists of sampling small reads (500 nt) at random location along the chromosomes. The total number of reads is chosen so that the likelihood that each nucleotide is included into more than one read is high (typically each nt is part of 3, 5 or 10 reads). Computers are then used to stitch the reads together. One solution to this problem is related to the shortest super-string problem.
Grammatical frameworks for RNA structure . RNA secondary structure information can be represented using context-free grammars. As with most biological data, the information is better represented within a statistical framework. A Stochastic Context-Free Grammar (SCFG) has probabilities attached to its production rules. The two main issues with SCFGs are the parsing and the induction of the grammar. Review the literature on SCFGs (this includes COVE, infernal and pfold), and build a prototype parser in Java.
Predicting Gene-Gene (Protein-Protein) interactions . There exist a vast number of algorithms that allow to predict if two genes will be interacting. This includes: text-mining, co-location along the chromosomes, phylogenetic footprinting, etc.
Lattice models . Predicting the three-dimensional structure of a protein is a notoriously difficult problem. So much that alternative problems have been devised to circumvent it: secondary structure prediction, inverse folding problem, etc. Some authors have also been studying simpler systems, such as 2D and 3D lattices. Create your own implementation; this includes an algorithm to efficiently search the folding space and a scoring function. Run some simulations.
Structure comparison methods . Review the literature on 3D structure comparison. Implement at least one algorithm. Input: 2 three-dimensional structures, output: a measure of distance (typically root-mean-square deviation expressed in ), and a list of equivalent residues.
Methods for detecting trans-membrane helices . There is class of transmenbrane proteins whose secondary structure can be reliably predicted. Those proteins are mainly made of helices, such that if the loop connecting the helices i and i + i is exposed to the inside of the cell, then the next one will be exposed to the outside of the cell. Use a Hidden Markov Model or Neural Network to reproduce this result.
Secondary Structure Prediction . Implement a secondary structure prediction method and compare its accuracy to known methods. Common choices for your implementation include: Neural Networks, Hidden Markov Models, and possibly decision trees.
Surface/Interior . Implement a algorithm to predict the solvent accessibility. Common choices for your implementation include: Neural Networks, Hidden Markov Models, and possibly decision trees.
Applications of suffix trees . Use Daniela Cerneas suffix tree library and implement some of the following algorithms: linear time algorithm for finding the longest common substring of k strings (interestingly, Knuth had predicted that no linear time algorithm would be found for solving this problem), finding all maximal repetitive substrings in linear time, finding all maximum palindromes, k -mismatch algorithm.
Bio-Ethics . Bioinformatics deals with biological and medical data, according there are numerous related ethical issues: should patenting genes be allowed? how to handle patient data? how to deal with genomic data, imagine that the analysis of a dataset allows to draw conclusions about a population, a religious group, people who live in a specific region, etc. The consequences can be sever: it could be that this group will be more likely to suffer from certain diseases, such information could be used by insurance companies, employers, etc. to screen candidate.
Genome motifs viewer . Construct a flexible graphical using interface to visualize shared motifs. Suggestions: make it 3D to ease viewing multiple strings. Motifs would be extracted from a suffix tree.
Teaching tools : interactive linear time construction of a suffix tree, showing the suffix links, interactive tools for software alignments.
Expectation-Maximization (EM) algorithm and some of its applications in molecular biology . EM is used for training certain Hidden Markov Models, Covariance Models and building phylogenetic trees. What is it? What are the main applications? Prototype implementation. (S)
Gibbs sampling . This technique forms the basis for several motif detection tools. What is it? What are the main applications? Prototype implementation. (S)
Bayesian networks . What are bayesian networks? What is interesting about them? What are the bioinformatics applications of bayesian networks? Carry out a small experiment. (S)
Predicting Phenotype from Patterns of Annotation, -arrays, etc . One of the goals of bioinformatics research is to transform molecular biology into a predictive science. For example, given a certain pattern of gene expression, detected by -arrays for example, what would be the best treatment (personalized medicine)? Survey the literature on the use of bioinformatics techniques to assist medical diagnosis, prognosis and treatments. Where are we heading? When will personalized medicine be true? How much data? Remaining problems to be solved?
Statistics behind BLAST . Good candidate for a multiple teams work, where one team would focus on the statistics of word matching while the other would focus on hashing. Produce a Java implementation of hashing techniques for speeding up the sequence alignment problem. The part on the statistical analysis of hits requires a statistical background (S) but not the algorithmic part.
Constructing phylogenetic trees . Read an overview of the construction of phylogenetic trees using a neighbour-joining approach. For this project, you will produce a prototype implementation, in Java, of a modern method such as: quartet method, maximum likelihood or maximum parsimony. (s)
QSAR . One of the main bioinformatics contributions to drug discovery is the Quantitative Structure Activity Relationship analysis (QSAR); the other is molecular docking. QSAR analyses take as input a set of compounds and their relative activity/efficacy. It then finds the commonalities between those molecules. The commonalities are then used to design new/better drugs.
Molecular docking consists of predicting how two molecules will interact. This can either be two proteins or one protein and a small compound, such as a new drug. The two main factors that are taken into account are the shape and electrostatics of the two molecules.
BioJava is a large collection of classes for solving bioinformatics problems. See #-Link-Snipped-#.
Java3D . A protein viewer was developed two years ago in the context of a CSI 4900 project. Extensions of this project could be considered.
Tandem repeats . Review the literature on tandem repeats detection and implement a prototype application. Tandem repeats are repeats of the form n , s.t. 2
Simultaneous alignment and structure prediction for two RNA sequences . Implement a simplified version of dynalign, where the secondary structure prediction is calculated using the Nissinov algorithm; i.e. finds the maximum number of base pairs.
3 way genome alignments .
1. Testing for absence of secondary structure in combinatorial sets of DNA strands.

Akshay Sanap "Resolving Complexity of Bioinformatic Algorithms using Python".

Hi, I'm umer currently trying to find out a better final project for masters in bioinformatics. Can u plz send me extra details related to " Predicting Gene-Gene (Protein-Protein) interactions . There exist a vast number of algorithms that allow to predict if two genes will be interacting. This includes: text-mining, co-location along the chromosomes, phylogenetic footprinting, etc. " this topic at [email protected]

Hello I am a student of bioinformatics ..I am final year student and I want to select molecular docking as my final year project.. So kindly provide me a dataset of this project and its coding as well ..

Hi, I am a student of bioinformatics, final year. I want to do good programming project using Neural Networks and deep learning. Suggest any idea or dataset to work upon.

can I get help with next-generation sequencing technique topic, to gene related to breast , pancreatic or lung cancer

good day! I am an undergraduate and I need project topics on biomedical informatics

You are reading an archived discussion.

Gis project ideas topics collection for engineering students - 2023, how does buchholz relay work in a transformer, radio frequency transmitter and reciever, how to become a successful engineer, cloud computing questions.

CPSC 536A: Project Ideas

General remarks:.

Jobs 👨‍🔬
Career Advice 👧🏻
Scholarship 🎓
Admission 🎟️
Exams 📘
Biotech Internships & Projects 🏴
Research Proposals ✍️
Awards 🏆
Events 🌏
Workshop 👩‍⚕️
Promote & Earn
Teach & Earn
BioTecNika Global
💰 Earn from BioTecNika
👩‍🚀 Work @ BioTecNika
📜 Advertise With us
🤙 Contact Us
👬 About Us

Breaking News: AstraZeneca Shocks the World with Global Withdrawal of COVID-19…

The i3c BRIC-RCB Ph.D. Programme in Biosciences – Application Portal Open,…

Moderna Teams Up with OpenAI to Transform Drug Development with ChatGPT…

CSIR-UGC NET June 2024 Exam Official Notification | Exam Dates |…

World's First Human Genome Editing With AI

World’s First Human Genome Editing Using AI – Breakthrough By Researchers…

Subject-Wise Approach For NEET Exam – Expert Strategies Discussed

10 Easy Tips To Qualify NEET 2023 Exam

List of Best Reference Books To Crack NEET Exam + Preparation…

How To Crack NEET 2023 Exam In First Attempt – Top…

NEET 2022 Question Paper Discussion - NEET UG 2022 Question Paper

NEET UG 2022 Question Paper Discussion By Biotecnika NEET

Clini India Offering Advance Program in Clinical Research & Management (APCRM)…

100 Days – 100 Important Topics To Study For CSIR NET…

ISRO IN-SPACe Offering Course in Space Life sciences Technology. Apply Now!

Unlocking the Power of R Programming for Biologists – A LIVE…

Clinical R&D Certification Course By Bangalore Bioinnovation Centre & TrialGuna New

Clinical Research & Development Certificate Course By Bangalore Bioinnovation Centre &…

Success Podcast Biotecnika Exclusive - An Exclusive CSIR NET Podcast - Episode 7

35 Days To Success in CSIR NET – An Exclusive CSIR…

Success Podcast By Biotecnika - An Exclusive CSIR NET Podcast - Episode 6

45 Days To Success in CSIR NET – An Exclusive CSIR…

The Success Podcast Biotecnika - An Exclusive CSIR NET Podcast - Episode 5

49 Days To Success in CSIR NET – An Exclusive CSIR…

Success Podcast Biotecnika - An Exclusive CSIR NET Podcast - Episode 4

56 Days To Success in CSIR NET – An Exclusive CSIR…

Biotecnika Success Podcast - An Exclusive CSIR NET Podcast - Episode 3

57 Days To Success in CSIR NET – An Exclusive CSIR…

Biotecnika Times Newsletter 31.01.2023 DBT-RA Programme, TIFR Summer Research With Stipend

Biotecnika Times Newsletter 01.12.2022 RCB PhD Admission 2022, Coca Cola Job

Biotecnika Times Newsletter CIAB, IGIB Jobs, CSIR NET Final Ans Key,…

CSIR NET LIFE Science & GATE Biotech Syllabus Comparison

Genetics career Scope and Opportunities

Career advice, latest bioinformatics project ideas for graduates & postgraduates – complete list.

Bioinformatics Project Ideas for UG and PG

Bioinformatics Project Ideas for UG/PG!

Bioinformatics is an interdisciplinary field that combines biology, computer science, and mathematics to analyze and interpret biological data. It plays a crucial role in genomics, proteomics, and other life sciences research. Undertaking bioinformatics projects at both undergraduate (UG) and postgraduate (PG) levels can provide students with valuable skills and contribute to cutting-edge research. In this article, we present a range of bioinformatics project ideas suitable for students at different levels of their academic journey.

Project 1: Genome Annotation and Functional Analysis

Genome annotation is a fundamental task in bioinformatics that involves identifying and characterizing the various elements within a DNA sequence. This project idea offers opportunities for both undergraduate (UG) and postgraduate (PG) students to delve into the exciting world of genomics while honing their bioinformatics skills.

Undergraduate Level: Annotate a Small Genome

At the undergraduate level, students can take on the task of annotating a small genome, such as that of a bacterium. This project provides a hands-on experience with essential bioinformatics tools like GeneMark or AUGUSTUS, which are widely used for gene prediction.

Registrations Open For – Bioinformatics Global Research Online Hands-On Internship – Learn 30+ Computational Tools & Software

Data Acquisition: Begin by obtaining the DNA sequence of the chosen organism from a reputable database, such as GenBank.
Gene Prediction: Utilize tools like GeneMark or AUGUSTUS to predict the location of genes within the genome. These tools employ algorithms that analyze sequence features to identify potential genes.
Functional Annotation: Once genes are predicted, assign functions to them by searching for similarities in existing protein databases (e.g., BLAST searches against databases like NCBI’s NR or Swiss-Prot). This step helps in understanding the roles these genes play in the organism’s biology.
Regulatory Element Identification: Explore the genome for regulatory elements like promoters and enhancers, which control gene expression. Tools like MEME or FIMO can assist in motif discovery.
Functional Analysis: Analyze the functional categories of genes and identify pathways or biological processes they are involved in. This information can shed light on the organism’s biology and potential applications.

Postgraduate Level: Eukaryotic Genome Analysis

For postgraduate students, the project can be extended to working on a eukaryotic genome, such as that of a fungal species. This level of analysis offers a more complex challenge and opportunities for advanced techniques like comparative genomics.

Additional Steps for PG Students:

Comparative Genomics: Investigate evolutionary relationships by comparing the annotated genome to closely related species. Identify conserved genes and lineage-specific innovations, which can provide insights into the species’ evolutionary history.

Functional Enrichment Analysis: Perform enrichment analyses to identify overrepresented gene functions or pathways. This can help in understanding the biological significance of the genes identified and their potential roles in the organism.
Visualization: Create visual representations of the annotated genome, such as circular genome maps or synteny plots, to convey complex information effectively.
Publication: Encourage PG students to publish their findings in reputable scientific journals or present their work at conferences, contributing to the broader field of genomics.

Genome annotation and functional analysis projects offer a valuable learning experience for both undergraduate and postgraduate students in bioinformatics. These projects not only enhance students’ computational skills but also contribute to our understanding of the genetic makeup and biological functions of different organisms, from bacteria to eukaryotes.

Project 2: Metagenomics Analysis

Metagenomics is a rapidly evolving field that allows researchers to explore the genetic diversity and functional potential of entire microbial communities within environmental samples. This project idea offers engaging opportunities for both undergraduate (UG) and postgraduate (PG) students to dive into the world of metagenomics, from the analysis of small datasets to more complex and comprehensive projects.

Undergraduate Level: Analyze a Small Metagenomic Dataset

At the undergraduate level, students can embark on the analysis of a small metagenomic dataset obtained from environmental samples. This project provides hands-on experience with the foundational aspects of metagenomic analysis.

Steps for UG Students:

Dataset Selection: Choose a small metagenomic dataset, such as soil, water, or gut microbiome samples, from publicly available sources like NCBI’s Sequence Read Archive (SRA).
Data Preprocessing: Clean and preprocess the raw sequencing data by removing low-quality reads, adapters, and other artifacts.
Taxonomic Profiling: Utilize tools like Kraken, MetaPhlAn, or QIIME to identify and quantify the microbial taxa present in the sample. This step provides insights into the composition of the microbial community.
Diversity Assessment: Calculate diversity metrics (e.g., Shannon diversity index) to assess the richness and evenness of microbial species within the sample. Visualize diversity patterns using appropriate plots.
Functional Annotation: Predict the functional potential of the microbial community by aligning sequences to databases like KEGG or COG and assigning functional categories to the genes.
Ecological Inference: Infer potential ecological roles of detected microbes based on taxonomic and functional information. Are there any correlations between specific taxa and functions?

Postgraduate Level: Advanced Metagenomics Analysis

For postgraduate students, the project can be scaled up to tackle more extensive metagenomics analysis, potentially focusing on human microbiota or complex environmental microbiomes.

Large Dataset Handling: Work with larger and more complex metagenomic datasets. Consider sequencing data from diverse human body sites (e.g., gut, skin, oral) or complex environmental niches (e.g., extreme environments, wastewater treatment plants).
Advanced Taxonomic Profiling: Use advanced tools like MetaBAT, MaxBin, or CONCOCT for binning metagenomic contigs into draft genomes, allowing for a deeper understanding of individual microbial species within the community.
Functional Profiling: Employ tools such as HUMAnN or MEGAN to perform functional profiling, which provides insights into the metabolic potential of the microbial community.
Statistical Analysis: Apply statistical tests (e.g., differential abundance analysis) to identify significant differences in microbial composition or functional potential between sample groups.
Biological Interpretation: Investigate the ecological and physiological significance of the identified microbes and functions. Are there potential implications for human health or environmental processes?
Publication and Presentation: Encourage PG students to disseminate their findings through research publications or presentations at conferences, contributing to the growing field of metagenomics research.

In summary, metagenomics analysis projects offer a dynamic and multidisciplinary learning experience for both undergraduate and postgraduate students. These projects enable students to explore the intricate world of microbial communities, fostering skills in data analysis, bioinformatics, and ecological inference while making meaningful contributions to our understanding of diverse ecosystems. Bioinformatics Project Ideas.

Project 3: Protein Structure Prediction

Protein structure prediction is a critical area of bioinformatics that involves determining the three-dimensional arrangement of atoms in a protein molecule. It is a fascinating field with applications in drug discovery, understanding protein function, and more. This project idea provides opportunities for both undergraduate (UG) and postgraduate (PG) students to explore protein structure prediction, ranging from predicting secondary structures to tackling tertiary structure prediction and studying protein-ligand interactions.

Undergraduate Level: Predict the Secondary Structure

At the undergraduate level, students can begin by predicting the secondary structure of a protein sequence. This project offers insights into the fundamental aspects of protein folding and structure prediction.

Data Selection: Choose a protein sequence of interest, preferably from a well-studied organism, and obtain its amino acid sequence.
Secondary Structure Prediction: Utilize tools like PSIPRED or Porter to predict the secondary structure elements (e.g., alpha helices, beta strands) within the protein sequence.
Validation: Compare the predicted secondary structure with experimentally determined structures, if available, to assess the accuracy of the prediction.
Biological Implications: Explore how the secondary structure relates to the protein’s function or interaction with other molecules.

Postgraduate Level: Tertiary Structure Prediction and Protein-Ligand Interactions

For postgraduate students, the project can advance to tertiary structure prediction, which involves predicting the three-dimensional arrangement of atoms in the protein molecule. Additionally, students can delve into the study of protein-ligand interactions, which is essential for understanding drug binding and other biochemical processes.

Tertiary Structure Prediction: Select a protein for which the tertiary structure is not yet resolved or is of interest for further investigation. Employ advanced software like Rosetta or I-TASSER to predict the 3D structure of the protein.
Model Evaluation: Assess the quality of the predicted tertiary structure using metrics like RMSD (Root Mean Square Deviation) by comparing it to experimentally determined structures or high-quality reference models.
Protein-Ligand Docking: Learn about protein-ligand interactions by conducting molecular docking simulations. Use software like AutoDock or Vina to predict the binding mode of small molecules (ligands) to the protein of interest.
Binding Affinity Calculation: Calculate binding affinities to estimate the strength of protein-ligand interactions. Understand the factors that contribute to ligand binding and specificity.
Biological Insights: Analyze the biological implications of the predicted protein structure and ligand interactions. How do these insights contribute to understanding the protein’s function or potential drug targets?
Publication and Presentation: Encourage PG students to share their findings through research publications or presentations at scientific conferences, contributing to the field of structural biology and drug discovery.

In summary, protein structure prediction projects provide valuable opportunities for both undergraduate and postgraduate students to develop skills in computational biology, structural bioinformatics, and molecular modeling. These projects not only deepen their understanding of protein structure and function but also have practical applications in various domains, including drug design and biomedical research.

Project 4: Phylogenetic Analysis

Phylogenetic analysis is a crucial aspect of evolutionary biology and bioinformatics that involves studying the evolutionary relationships among organisms. This project idea offers opportunities for both undergraduate (UG) and postgraduate (PG) students to engage in phylogenetic analysis, starting with constructing basic phylogenetic trees and progressing to more complex methods.

Bioinformatics Project Ideas – Undergraduate Level: Construct a Simple Phylogenetic Tree

At the undergraduate level, students can begin by constructing a basic phylogenetic tree based on a gene or protein sequence. This project provides a foundational understanding of phylogenetics and evolutionary relationships.

Gene or Protein Selection: Choose a gene or protein of interest that is well-documented and has sequences available for multiple organisms.
Sequence Alignment: Align the sequences of the chosen gene or protein using software like ClustalW or MAFFT to identify conserved regions.
Phylogenetic Tree Construction: Utilize software such as MEGA or PhyML to construct a phylogenetic tree based on the aligned sequences. Apply methods like neighbor-joining or maximum parsimony.
Tree Visualization: Visualize the phylogenetic tree, highlighting the evolutionary relationships among the organisms.
Interpretation: Gain insights into the evolutionary history and relatedness of the organisms based on the tree’s topology. Consider factors like branching patterns and branch lengths.

Postgraduate Level: Complex Phylogenetic Analyses and Co-evolutionary Patterns

Bioinformatics Project Ideas – For postgraduate students, the project can advance to more complex phylogenetic analyses, incorporating maximum likelihood methods and exploring co-evolutionary patterns among genes or organisms.

Maximum Likelihood Analysis: Learn and apply maximum likelihood methods for phylogenetic tree reconstruction, which offer more accurate models of sequence evolution. Software packages like RAxML or PhyML can be used.
Molecular Clock Analysis: Investigate the concept of molecular clocks to estimate divergence times between species. This involves incorporating evolutionary rates into phylogenetic analyses.
Co-evolutionary Analysis: Explore co-evolutionary patterns between genes, proteins, or organisms using tools like Coevol or CAPS. Understand how changes in one component correlate with changes in another.
Advanced Tree Visualization: Use advanced tree visualization tools to create informative and publication-quality figures. Highlight key evolutionary events or relationships.
Biological Interpretation: Analyze the implications of the phylogenetic findings. How do the results contribute to our understanding of evolutionary processes, adaptations, or co-evolutionary dynamics?
Publication and Presentation: Encourage PG students to disseminate their findings through research publications or presentations at scientific conferences, contributing to the field of evolutionary biology and phylogenetics.

In summary, phylogenetic analysis projects offer a captivating journey into the study of evolutionary relationships among organisms. These projects provide valuable insights into the evolutionary history of genes, proteins, and species, and they equip students with essential skills in bioinformatics and computational biology. Additionally, complex phylogenetic analyses enable postgraduate students to explore cutting-edge methods and contribute to our understanding of co-evolutionary dynamics in biology.

Project 5: Drug Discovery and Virtual Screening

Drug discovery is a multidisciplinary field that combines biology, chemistry, and computational methods to identify and design potential drug candidates. This project idea provides opportunities for both undergraduate (UG) and postgraduate (PG) students to explore the exciting world of drug discovery, starting with basic virtual screening experiments and progressing to advanced structure-based drug design.

Undergraduate Level: Basic Virtual Screening

At the undergraduate level, students can start by learning about drug databases and conducting basic virtual screening experiments to identify potential drug candidates. This project offers an introduction to the concepts and tools used in drug discovery.

Drug Database Exploration: Familiarize yourself with drug databases like PubChem or DrugBank. Select a target protein of interest, preferably one with known drug-binding sites.
Ligand Preparation: Retrieve ligand molecules (small compounds) from the database that may potentially bind to your target protein. Prepare the ligands by removing any irrelevant atoms or functional groups.
Protein-Ligand Docking: Utilize software tools like AutoDock or PyRx to perform virtual docking experiments. Dock the prepared ligands into the binding site of the target protein and calculate binding energies.
Analysis: Analyze the docking results to identify potential drug candidates. Consider factors like binding energy, binding pose, and ligand-protein interactions.
Visualization: Visualize the binding interactions using molecular visualization software. Understand how the ligands interact with the target protein.

Postgraduate Level: Structure-Based Drug Design and Molecular Dynamics

For postgraduate students, the project can advance to structure-based drug design, including in-depth studies of protein-ligand interactions and the use of molecular dynamics simulations for drug candidate evaluation.

Protein-Ligand Interactions: Dive deeper into the study of protein-ligand interactions. Investigate specific binding modes, hydrogen bonds, hydrophobic interactions, and other molecular interactions between the ligands and the target protein.
Molecular Dynamics Simulations: Learn about molecular dynamics simulations using software like GROMACS or AMBER. Perform simulations to study the dynamic behavior of the protein-ligand complex over time.
Free Energy Calculations: Apply advanced techniques like free energy calculations to estimate binding affinities more accurately. Understand the thermodynamics of ligand binding.
Drug Candidate Evaluation: Evaluate the potential drug candidates based on their stability, binding affinity, and pharmacokinetic properties. Consider factors like drug-likeness, toxicity, and solubility.
Biological Interpretation: Analyze the biological relevance of the identified drug candidates. Explore their potential applications, therapeutic targets, and mechanisms of action.
Publication and Presentation: Encourage PG students to share their findings through research publications or presentations at scientific conferences, contributing to the field of drug discovery and structure-based drug design.

In conclusion, drug discovery and virtual screening projects offer a fascinating exploration of the intersection between computational biology and pharmaceutical research. These projects equip students with valuable skills in computational chemistry, molecular modeling, and drug development, making them well-prepared for careers in pharmaceuticals, biotechnology, and academic research.

Project 6: RNA-Seq Data Analysis – Bioinformatics Project Ideas

RNA-Seq is a powerful technique for studying gene expression at the transcript level. This project idea provides opportunities for both undergraduate (UG) and postgraduate (PG) students to gain experience in RNA-Seq data analysis, starting with the basics and progressing to more advanced techniques.

Undergraduate Level: Basic RNA-Seq Data Analysis

At the undergraduate level, students can begin by analyzing RNA-Seq data from a small experiment. This project introduces fundamental steps in RNA-Seq data analysis, including quality control, mapping, and differential gene expression analysis.

Data Acquisition: Obtain RNA-Seq data from a small-scale experiment, such as those available in public repositories like the NCBI Sequence Read Archive (SRA).
Quality Control: Perform quality control on the raw sequencing data to assess data quality and identify potential issues.
Read Mapping: Use tools like STAR or HISAT2 to map the sequenced reads to a reference genome or transcriptome.
Quantification: Estimate gene or transcript expression levels using software like featureCounts or StringTie.
Differential Expression Analysis: Identify genes that are differentially expressed between experimental conditions using DESeq2 or edgeR.
Visualization: Create visualizations, such as heatmaps or volcano plots, to illustrate the results of differential expression analysis.

Postgraduate Level: Advanced RNA-Seq Analysis

For postgraduate students, the project can advance to handling more extensive RNA-Seq datasets and exploring advanced analyses, such as alternative splicing, pathway analysis, and functional enrichment.

Bioinformatics Project Ideas – Additional Steps for PG Students:

Large Dataset Handling: Work with larger RNA-Seq datasets, which may include multiple experimental conditions or time points. Implement strategies for efficient data processing.
Alternative Splicing Analysis: Investigate alternative splicing events using tools like rMATS or SUPPA. Understand the regulation of splicing and its impact on gene expression diversity.
Pathway Analysis: Perform pathway analysis to identify biological pathways that are significantly enriched with differentially expressed genes. Utilize tools like Enrichr or DAVID.
Functional Enrichment Analysis: Conduct functional enrichment analysis to gain insights into the biological functions and processes associated with differentially expressed genes. Explore tools like GOseq or clusterProfiler.
Visualization and Interpretation: Generate interactive visualizations, network analyses, and gene ontology plots to interpret the biological significance of the RNA-Seq data.
Publication and Presentation: Encourage PG students to communicate their findings through research publications or presentations at scientific conferences, contributing to the field of transcriptomics and functional genomics.

RNA-Seq data analysis projects offer valuable hands-on experience in transcriptomics and bioinformatics. These projects equip students with essential skills in data analysis, statistical analysis, and biological interpretation, enabling them to contribute to our understanding of gene expression regulation and its implications in various biological processes and diseases.

Project 7: Network Analysis in Systems Biology

Network Analysis in Systems Biology

Network analysis is a powerful approach in systems biology, enabling the exploration of complex interactions between biological components. This project idea provides opportunities for both undergraduate (UG) and postgraduate (PG) students to engage in network analysis, ranging from constructing simple biological networks to investigating intricate networks and their implications in disease prediction.

Undergraduate Level: Building Simple Biological Networks

At the undergraduate level, students can begin by constructing a basic biological network, such as a gene regulatory network or a protein-protein interaction network, using publicly available data and Cytoscape software. This project introduces the fundamental principles of network construction and visualization.

Data Collection: Gather biological data related to the network of interest from public databases like NCBI or STRING. This data could include gene expression data, protein interactions, or molecular pathways.
Data Preprocessing: Clean and format the data to ensure it is suitable for network construction. Address any missing or inconsistent information.
Network Construction: Use Cytoscape or similar network analysis tools to build the biological network. Connect nodes (genes or proteins) based on established criteria, such as co-expression or experimentally validated interactions.
Visualization: Create a visually appealing and informative representation of the network. Customize the layout and styling to highlight node attributes, such as gene functions or expression levels.
Basic Analysis: Conduct basic network analysis, such as identifying highly connected nodes (hubs) and calculating network centrality measures (e.g., degree, betweenness).

Postgraduate Level: Advanced Network Analysis and Disease Prediction – Bioinformatics Project Ideas

For postgraduate students, the project can advance to exploring more complex biological networks, analyzing network motifs, and using networks for disease prediction and analysis.

Complex Network Analysis: Work with larger and more intricate biological networks, including multi-layer networks or dynamic networks. Apply advanced network analysis techniques to uncover hidden patterns and structures.
Network Motif Analysis: Investigate network motifs, which are recurring subgraphs within the network. Analyze their significance and potential roles within the biological context.
Network-Based Disease Predictions: Explore the use of network-based approaches for predicting disease-associated genes or identifying potential drug targets. Employ methods such as random walk-based algorithms or diffusion-based techniques.
Biological Interpretation: Interpret the findings within the context of biology and disease mechanisms. Understand how network properties and motifs relate to biological processes or disease pathways.
Visualization and Reporting: Create comprehensive visualizations, such as pathway maps or interactive network diagrams, to illustrate the results of network analysis. Summarize the findings in research papers or reports.
Publication and Presentation: Encourage PG students to disseminate their research findings through research publications or presentations at scientific conferences, contributing to the field of systems biology and network analysis.

Network analysis in systems biology projects offers an exciting opportunity to explore the complex interactions within biological systems. These projects equip students with valuable skills in data analysis, network construction, and biological interpretation, enabling them to contribute to our understanding of complex biological networks and their role in health and disease.

Project 8: Machine Learning in Bioinformatics

Machine Learning in Bioinformatics

Machine learning is revolutionizing the field of bioinformatics by providing tools to analyze and extract insights from biological data. This project idea offers opportunities for both undergraduate (UG) and postgraduate (PG) students to explore machine learning in the context of bioinformatics, ranging from introductory concepts to advanced techniques.

Undergraduate Level: Introduction to Machine Learning in Bioinformatics

Bioinformatics Project Ideas – At the undergraduate level, students can begin by learning the basics of machine learning and applying them to a simple bioinformatics problem, such as predicting protein function. This project provides an introduction to the principles of machine learning and its applications in biology.

Machine Learning Fundamentals: Familiarize yourself with the fundamentals of machine learning, including supervised and unsupervised learning, classification, and regression.
Data Collection: Obtain a dataset relevant to a bioinformatics problem. For instance, you can use protein sequence data with known functions.
Data Preprocessing: Clean and preprocess the data, addressing missing values, feature scaling, and data transformation as needed.
Feature Selection: Identify relevant features (e.g., sequence motifs, physicochemical properties) that may be predictive of the target variable (protein function).
Model Selection: Choose a suitable machine learning algorithm (e.g., decision trees, support vector machines) for classification or regression based on the problem.
Model Training: Train the machine learning model on the labeled dataset, using a portion of the data for training and the rest for validation.
Model Evaluation: Assess the model’s performance using appropriate metrics (e.g., accuracy, F1-score) and visualize the results.
Interpretation: Interpret the model’s predictions and understand which features are most important for the prediction task.

Postgraduate Level: Advanced Machine Learning in Genomics and Metagenomics

For postgraduate students, the project can delve into more advanced machine learning techniques, specifically applying deep learning to genomics or metagenomics classification problems.

Deep Learning for Genomics: Learn about deep learning architectures like convolutional neural networks (CNNs) or recurrent neural networks (RNNs) and their applications in genomics. Explore tasks like DNA sequence classification or gene expression prediction.
Metagenomics Classification: Work with metagenomic data and apply advanced machine learning techniques to classify microbial communities or detect pathogens in metagenomic samples.
Model Tuning: Experiment with hyperparameter tuning, model ensembles, or transfer learning to optimize the performance of deep learning models on bioinformatics datasets.
Interpretability: Investigate methods for explaining deep learning predictions in genomics or metagenomics. Understand how specific features or regions in sequences influence the model’s decisions.
Publication and Presentation: Encourage PG students to share their findings through research publications or presentations at scientific conferences, contributing to the growing field of machine learning in bioinformatics.

Machine learning in bioinformatics projects offers an exciting opportunity to apply data-driven approaches to solve biological problems. These projects equip students with valuable skills in data preprocessing, model selection, and interpretation, enabling them to make meaningful contributions to understanding biological systems through advanced machine learning techniques.

Bioinformatics Project Ideas – Bioinformatics offers a vast array of project possibilities for students at both undergraduate and postgraduate levels. These Bioinformatics Project Ideas for UG/PG not only enhance bioinformatics skills but also contribute to ongoing research in fields like genomics, proteomics, and drug discovery. Whether you are just starting your academic journey or pursuing advanced studies, these project ideas can help you explore the fascinating world of bioinformatics and make meaningful contributions to the field. Remember, the key to a successful bioinformatics project lies in curiosity, diligence, and a willingness to embrace interdisciplinary challenges .

Transition Into Bioinformatics Career in 2024 – Essential Skills Required, Best College, Scope & Future Opportunities

Complete Guide to Bioinformatics Career – Download FREE ebook Today!

Top Emerging Trends in Bioinformatics

Comments are closed.

Transition Into Bioinformatics Career in 2024 – Essential Skills Required, Best...

BIO-IT Career Starter Kit : FREE PDF Download

QTLomics Technologies Hiring For Bioinformatician Role – Applications Invited

Plant Molecular Biology Job Recruitment at QTLomics Technologies, Apply Now

ACTREC MSc Microbiology Project Staff Recruitment – Applications Invited

SJRI Life Sciences Non-Med Research Scientist With Rs. 85,000 pm Pay,...

Privacy Policy
Terms of Service
Advertise on BioTecNika

IMAGES

Bioinformatics Projects, IEEE Bioinformatics Projects
Bioinformatics Project Training for 2,4,6 month
Innovation idea visual for project ideas and bioinformatics infographic
Bioinformatics Project Ideas
Shine in Your Core with Best Bioinformatics Projects for the Final year
Major Benefits Of Bioinformatics Projects In Different Sectors

VIDEO

Absolute Genius S1E7: Bioinformatics
Top Industrial Bioinformatics training projects for biology students. #bti #bioinformatics #india
Bioinformatics 101
Revolutionize Research with Bioinformatics Tools! #bioinformatics #skills #research
Best Bioinformatics Project Ideas #bioinformatics #project #shorts
Dr. Aditi Agrawal, Senior Scientific Bioinformatics, Application Scientist, Qiagen

COMMENTS

5 Machine Learning Projects in Bioinformatics For Practice
Here are five exciting machine learning projects for bioinformatics to help you understand the application of machine learning in healthcare, mainly bioinformatics. 1. Anti-Cancer Drug Efficacy Prediction. Predicting which patients are likely to benefit or not from a specific therapy is a significant concern in cancer treatment because ...
Bioinformatics Projects for beginner? : r/bioinformatics
It's one of your best bets if you have actual data to analyze (or download available datasets off the internet, e.g. NCBI). It can be difficult to pick up on at first; you'll have to do your own investigating, troubleshooting, etc., but it's extremely rewarding. It requires no prior knowledge in programming/coding.
Best Project Ideas for Bioinformatics
Tips for a Successful Bioinformatics Project. Plan your project carefully and set clear objectives. Collaborate with experts in related fields. Stay updated with the latest bioinformatics ...
bioinformatics · GitHub Topics · GitHub
Bioinformatics. Bioinformatics is an interdisciplinary field that intersects with biology, computer science, mathematics and statistics. It concerns itself with the development and use of methods and software tools for collecting and analyzing biological data.
Apply Bioinformatics Science Projects
Find ideas for science fair projects using bioinformatics, the application of computer science to biology. Learn how to analyze health data, search for genes, create proteins, and more with online databases and tools.
Practical Hands-On Bioinformatics Projects
86 responses to our survey so far. Clearly, it is important to be able to do the analysis personally and therefore learn how to do it. And it shouldn't take a computer science degree.
Bioinformatics project ideas
How to find bioinformatics projects for early career scientists and students.Here is the Twitter thread: https://twitter.com/marianattestad/status/1301039878...
Finding Biological Datasets to Inspire Your Next Bioinformatics Project
HMP16SData: Human Microbiome Project 16S Data; microbiome; Single-Cell RNA-Sequencing Datasets. One of the most remarkable innovations in molecular transcriptomics is single-cell RNA sequencing. It lets us assess which genes are active in individual cells, allowing us to characterize and group them.
The Perfect Marriage of Computer Science & Medicine
Biomedical Informatics is a broad discipline that encompasses bioinformatics and computational biology. Online bioinformatics resources, such as the database Online Mendelian Inheritance in Man, or OMIM, allow bioscience researchers to search up-to-date information on human genes, genetic traits and disorders.
PDF project_ideas
1/2 University of Wisconsin-Madison BMI/CS 776: Advanced Bioinformatics Prof. Daifeng Wang Spring 2022 Project ideas. Run multiple network algorithms to predict pathways from genome-wide datasets and compare the results. Potential network algorithms include: Benchmark multiple algorithms for reconstructing cell-type specific gene regulatory ...
Bioinformatics Capstone: Big Data in Biology
Develop job-relevant skills with hands-on projects; Earn a shareable career certificate; Earn a career certificate. Add this credential to your LinkedIn profile, resume, or CV. ... to apply several standard bioinformatics software approaches to real biological data. In particular, in a series of Application Challenges will see how genome ...
Available Projects in Bioinformatics and Machine Learning
Below are 7 potential projects. The descriptions are sparse, but I can provide many more details. 1. Discriminative Graphical Models for Protein Sequence Analysis 2. Embedding Sequences into Euclidean Spaces 3. Discovering the Genetic Basis of Human Disease 4. Statistical and Algorithmic Aspects of Motif Discovery 5.
Bioinformatics
Bioinformatics is a field of study that uses computation to extract knowledge from biological data. It includes the collection, storage, retrieval, manipulation and modelling of data for analysis ...
Undergraduate and Masters Research
Undergraduate and Masters Research. General Information. There are plenty of opportunities for Bioinformatics research projects at UCLA. This program is designed to help interested students find research projects related to Bioinformatics across campus. Typically, these projects are for credit; in exceptional circumstances they may offer funding.
Project Examples
Sample Grouping or individual as per experimental design, Group-wise OTU Clustering and abundance Report, OTU identification and taxonomic annotation Report (Sample Wise - Genius Level) and OTU Fasta file will be provided, Pie chart representation TOP 10 taxonomic classification; phylum to species-level. 5. SmallRNA Sequencing.
Bioinformatics Projects Supporting Life-Sciences Learning in ...
The interdisciplinary nature of bioinformatics makes it an ideal framework to develop activities enabling enquiry-based learning. We describe here the development and implementation of a pilot project to use bioinformatics-based research activities in high schools, called "Bioinformatics@school." It includes web-based research projects that students can pursue alone or under teacher ...
ISBDS: Project ideas and templates
Independent Study in Biomedical Informatics (ISBDS) This document provides ideas for research projects, and links to research plan templates, which are partially completed plans. Template files are available via the ISBDS course GitHub repository . For ISBDS, a research plan template can vary within biomedical science topics, but definitely ...
Genetics & Genomics Science Projects
In this science project, you will use methods that bioinformatics and biotech scientists perform on a daily basis to decipher the human genome in their efforts to diagnose and treat genetic diseases. Read more. Unlock the building blocks of life with genetics and genomics science experiments.
Bioinformatics Science Fair Projects and Experiments
Use an online bioinformatics database to find out which animals currently have genome projects, which projects are complete, and which projects are still in progress. [ E] Bioinformatics science fair projects and experiments: topics, ideas, resources, and sample projects by scientific field.
Bioinformatics Project Ideas/Topics Collection For ...
Find a list of project ideas based on Bioinformatics for third year or final year engineering students. The list covers topics such as cellular localization, regulatory motifs, SNPs, metabolic pathways, molecular arrays, mass spectrometry, expression data, genome comparison, ontologies, genome assembly, RNA structure, protein interactions, lattice models, structure comparison, transmembrane helices, secondary structure prediction, solvent accessibility, and protein structure prediction.
CPSC 536A
Start discussing project ideas you are interested in with your fellow students and with us (Anne and Holger) as soon as possible - you need to have selected a project by January 16th and a proposal has to be submitted by January 25th. 1. Testing for absence of secondary structure in combinatorial sets of DNA strands. Background.
Frontiers in Bioinformatics
From one genome to many genomes: the evolution of computational approaches for pangenomics and metagenomics analysis. An innovative journal that provides a forum for new discoveries in bioinformatics. It focuses on how new tools and applications can bring insights to specific biological problems.
Bioinformatics Project Ideas for UG/PG: A Step-by-Step Guidance!
Project 6: RNA-Seq Data Analysis - Bioinformatics Project Ideas RNA-Seq is a powerful technique for studying gene expression at the transcript level. This project idea provides opportunities for both undergraduate (UG) and postgraduate (PG) students to gain experience in RNA-Seq data analysis, starting with the basics and progressing to more ...

5 Machine Learning Projects in Bioinformatics For Practice

Top 5 Machine Learning Projects in Bioinformatics

1. Anti-Cancer Drug Efficacy Prediction

2. Autism Mutation Detection

Here's what valued users are saying about ProjectPro

3. Personalized Cancer Medication

4. Human Disease Genetic Basis Identification

5. Build a DNA Sequence Classifier

About the Author

Navigation Menu

Saved searches

Bioinformatics

Here are 9,185 public repositories matching this topic...

plotly / dash

biopython / biopython

google / deepvariant

seandavi / awesome-single-cell

danielecook / Awesome-Bioinformatics

nextflow-io / nextflow

OpenGene / fastp

scverse / scanpy

lh3 / minimap2

allenai / scispacy

broadinstitute / gatk

bioconda / bioconda-recipes

galaxyproject / galaxy

lh3 / seqtk

soedinglab / MMseqs2

shenwei356 / seqkit

MultiQC / MultiQC

lightaime / deep_gcns_torch

Related Topics

Available Projects in Bioinformatics and Machine Learning

Bioinformatics articles from across Nature Portfolio

Decoding cell replicational age from single-cell ATAC-seq data

Annotating cell types in single-cell ATAC data via the guidance of the underlying DNA sequences

Latest Research and Reviews

Invariant γδTCR natural killer-like effector T cells in the naked mole-rat

Bioinformatics leading to conveniently accessible, helix enforcing, bicyclic ASX motif mimics (BAMMs)

Using optical coherence tomography to assess luster of pearls: technique suitability and insights

Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder

Enzyme-assisted high throughput sequencing of an expanded genetic alphabet at single base resolution

Stellae-123 gene expression signature improved risk stratification in taiwanese acute myeloid leukemia patients

News and Comment

Complement(ing) the microbiome in infants through breastmilk

‘Wildly weird’ RNA bits discovered infesting the microbes in our guts

It’s me, hi, I solved the problem, it’s TF-seqFISH

UniBind: a novel artificial intelligence-based prediction model for SARS-CoV-2 infectivity and variant evolution

Quick links

Featured News

Project Examples

Deliverables for Basic/Standard Analysis

1. Whole Genome Sequencing

2. Transcriptome Sequencing

3. Chip Sequencing

4. Metagenome Sequencing

5. SmallRNA Sequencing

6. Microbiome Sequencing

Deliverables for Advanced Analysis

Bioinformatics Projects Supporting Life-Sciences Learning in High Schools

Background and Motivation

The “Bioinformatics@school” Program

Implementing “Bioinformatics@school”

Teacher training

Extending the Program and Sustainability

Impact Assessment

Discussion and Future Directions

Supporting Information

Acknowledgments

Translational Informatics

ISBDS: Project ideas and templates

General Suggestions

Bioinformatics

Cheminformatics

Drug Discovery

Medical Informatics

Computational modeling

Public Health & Epidemiology

Fitness, Wellness, & Health

Natural language processing (NLP) and text mining