The topic is about implementation of oneAPI analytics toolkit in Medical Science.We will be exploring single cell data (eg:- scRNA sequence).
We will be porting Clustergrammer2 to AI analytics toolkit. Clustergrammer2 produces highly interactive visualizations that enable intuitive exploration of high-dimensional data and has several optional biology-specific features (e.g. enrichment analysis; see Biology-Specific Features) to facilitate the exploration of gene-level biological data. It is a web base tool for visualizing and analysing high dimensional data (eg single cell RNA sequence) as interactive and shareable heatmaps.
We will be exploring gene expression data that has got very good implementation I terms of studying diseases such as cancer. As we explore heatmaps the information we get is very useful for studying where gene mutation has occurred. Porting Clustergrammer 2 to AI Analytics toolkit gives us an edge of exploring data interactively of 2700 PBMC’s(Peripheral blood mono nuclear cell)obtained from 10X GENOMICS(dataset). We will be using Intel Optimized Python from AI analytics toolkit and run the programs in Intel DevCloud
We will also use an external dataset for exploration known as CIBERSORT(This dataset provides an es timation of abundances of number of cell types in a mixed population using gene expression data.We will be loading the data as a Sparse matrix format. The dataset consists of 32 thousand genes and 2700 single cells.
Using Intel Optimized python we will normalize the dataset(i.e gene expression data GEX data) and find top expressing genes. Then we will implement ArcSinh transform and Z-Score. After that we load the data into CLusterGrammer2 that we ported for AI Analytics toolkit. We observe interactive heatmaps.
Here are the features of ClusterGrammer2