Graph theory is a branch of discrete mathematics applied to the study of various
real-world networks and their properties including biological networks. In our group
we explore various graph properties such as centrality measures, modularity, eigen spectra,
etc. for the analysis of biological networks, viz., protein contact networks (residue-residue
interaction networks), co-expression gene networks and metabolic networks.
In our group we use graph based approach for the analysis of protein structures and have developed a
webserver for network based analysis of protein structures, NAPS, that facilitates quantitative and
qualitative (visual) analysis of residue-residue interactions in: single chains, protein complex,
modelled protein structures and trajectories (e.g. from molecular dynamics simulations).The network
representation of proteins provide a systems approach to topological analysis of complex three-dimensional
structures irrespective of secondary structure and fold type and provide insights into structure-function
relationship.
Identifying Structural Repeats. An important problem in protein structure investigations is the study
of structural periodicity that suggests ways of ultra-molecular assembly for the formation of higher
order structure. Tandemly repeated structural motifs arrange within a protein to form highly stable
super secondary structural folds, providing a repertoire of cellular functions. Consequently, defects
in repeat proteins have been found in a number of human diseases. Repeats originate from intragenic
duplication and recombination events, and accumulate mutations during the course of evolution making
them undetectable at the sequence level. We have developed algorithms for the identification of
Ankyrin repeats, AnkPred and tandem structural repeats in proteins, PRIGSA, using graph spectral
approaches. A database of structural repeats in proteins, StRiPs is being developed that hosts known
protein repeat families as well as novel, previously uncharacterized protein repeat clusters. The
structural repeat proteins in StRiPs can be analyzed at structure and sequence level to explore the
evolutionary constraints on the sequence-structure relation on one hand, and to understand the
structure-function correlation in repeat proteins.
Identifying Structural Domains. Domain identification is an important problem in protein function
analysis as it forms the first step in the classification of proteins. We proposed a combination of
top-down and bottom-up hierarchical clustering approaches for domain identification, since independently
each of these approaches have the problem of over-cutting or under-cutting the domains resulting in
incorrect assignment of domains, especially when the domains are non-contiguous. The algorithm proceeds by
first decomposing the protein contact graph into spatially compact structural modules and then assembling
them into true domains by analyzing the eigenvector spectra of the leading eigenvalue of a modified
adjacency matrix, called modularity matrix.
Related Publications:
1. NAPS: Network Analysis of Protein Structures, Broto Chakrabarty and Nita Parekh,
Nucleic Acids Res.,44(W1), W375-82, (2016).
link
2. PRIGSA: Protein Repeat Identification by Graph Spectral Analysis, Broto Chakrabarty
and Nita Parekh, J. Bioinfo. Comp. Biol.,12(6), 1442009, (2014).
link
3. Identifying Tandem Ankyrin Repeats in Protein Structures, Broto Chakrabarty and Nita Parekh,
BMC Bioinformatics,15(1), 6599, (2014).
link
4. Graph Centrality Analysis of Structural Ankyrin Repeats, Broto Chakrabarty and Nita Parekh,
International Journal of Computer Information Systems and Industrial Management Applications, 6(6),
305-314, (2014). ISSN:2150-7988.
link
5. Analysis of Graph Centrality Measures for Identifying Ankyrin Repeats, Broto Chakrabarty and
Nita Parekh, in IEEE proceedings of World Congress on Information and Communication Technologies (WICT),
156-161, 2012. ISBN:978-1-4673-4806-5, DOI:10.1109/WICT.2012.6409067.
link
6. Identifying Structural Repeats in Proteins using Graph Centrality Measures, Ruchi Jain, Hari
Krishna Yalamanchili and Nita Parekh, in the IEEE proceedings of World Congress on Nature & Biologically
Inspired Computing (NaBIC), pg. 110 – 115, (2009).
link
7. Graph Spectral Approach for Identifying Protein Domains, Hari Krishna Yalamanchili and Nita Parekh,
in “Bioinformatics and Computational Biology”, S. Rajasekaran (Ed.), Lecture Notes in Computer Science,
vol. LNBI 5462, 437-448, Springer Verlag Berling Heidelberg (2009).
link
8. Computational Approaches to Protein Domain Identification, Nita Parekh, chapter in the book
titled “Applied Computational Biology and Statistics in Biotechnology and Bioinformatics”, ed. A.K. Roy,
Vol. 1, Chap 27, 677-696, New India Publishing Agency, New Delhi (2012). ISBN:9789380235929.
link
Presentations in Conferences/Workshops:
1. Graph based Identification of Structural Repeats in Proteins, Broto Chakrabarty and Nita Parekh,
poster presentation in International Symposium on Chemistry with Computers, 18-19 Jan,
IICT-Hyderabad (2014).
2. PRIGSA: Protein Repeat Identification by Graph Spectral Analysis, oral presentation in
International Conference on Genome Informatics, GIW ISCB-ASIA 2014, 15–17, Dec, 2014, Odaiba, Tokyo, Japan.
3. Graph based Identification of Structural Repeats in Proteins, Broto Chakrabarty and Nita Parekh,
poster presentation in International Conference on Bioinformatics (InCoB), 20-22 Sept, Taicang, China (2013).
4. Identifying Structural Tandem Repeats in Proteins By Graph Spectral Analysis, Broto Chakrabarty and
Nita Parekh, poster presentation in International Conference on Biomolecular Forms and Functions: A celebration
of 50 years of the Ramachandran map, 8-11 Jan, IISc Bangalore (2013).
5. Analysis of Graph Centrality Measures for Identifying Ankyrin Repeats, Broto Chakrabarty and Nita Parekh,
oral presentation in World Congress on Information and Communication Technologies (WICT), 30 Oct-2 Nov,
Trivandrum (2012).
6. Identification of Ankyrin Repeats in Three-Dimensional Protein Structures, Broto Chakrabarty and
Nita Parekh, poster and oral presentation in the Accelerating Biology 2012, CDAC Pune, 15-17 February (2012).
7. Identifying Structural Repeats in Proteins using Graph Centrality Measures, Ruchi Jain, Hari Krishna
Yalamanchili and Nita Parekh, poster presentation at World Congress on Nature & Biologically Inspired
Computing (NaBIC), Coimbatore, 9-11 Dec. (2009).
8. Graph Spectral Approach for Identifying Protein Domains, Hari Krishna Yalamanchili and Nita Parekh,
poster presentation at Symposium on Theoretical and Mathematical Biology, IISER, Pune, 10 – 11, October (2009).
9. Graph Spectral Approach for Identifying Protein Domains, Hari Krishna Yalamanchili and Nita Parekh,
poster presentation at the 1st International Conference on Bioinformatics and Computational Biology (BICoB),
New Orleans, Louisiana, USA, 8-10 April (2009).
10. Graph Spectral Approach for Identifying Protein Domains, Hari Krishna Yalamanchili and Nita Parekh,
poster presentation at National Symposium on Cellular and Molecular Biophysics, CCMB, Hyderabad
22-24 January (2009).
Rice is an economically important food crop both in India as well as the rest of the world.
Currently, the production of food crops is hampered by various stress conditions such as abiotic
(e.g., drought, salinity, cold, etc.) and biotic (e.g., weeds, insects and plant pathogens) stress.
Stress perceptions are translated into a cascade of molecular events involving a network of
transcription factors and other early stress-responsive genes. With the availability of a large
number of high-throughput data at the transcriptome, proteome and metabolome level, we are interested
in a systems-level understanding of stress-response. Using such data and various bioinformatic
resources focused on plants, one can study the relevant correlations at various levels.
We are currently interested in the construction and analysis of stress-specific co-expression networks
using microarray and RNA-seq data. As crops are exposed to different stress conditions in the field
environment, it will be interesting to identify the gene signatures and processes which are unique or
shared across the various stresses. With over 50% of genes in rice lacking annotation for biological
processes, condition-dependent co-expression networks of rice can be helpful in the functional annotations
of uncharacterized genes. Also, conserved network-neighbourhood with model species, viz., Arabidopsis
can be used to identify evolutionarily conserved processes.
To aid in systems-level studies, we are in the process of integrating stress-specific transcriptomic
networks with protein-protein interactions and metabolic pathway information for rice. This resource is
being built using the Neo4j framework (highly scalable native graph database) and will aid in functional
analysis and network visualization in rice research.
Related Publications:
1. Meta-analysis of Drought-tolerant Genotypes in Oryza sativa: A Network-based Approach, Sanchari Sircar
and Nita Parekh, (submitted, in review).
2. Protocol for Co-expression Network Construction and Stress-responsive Expression Analysis in
Brachypodium, Sanchari Sircar and Nita Parekh, in Methods in Molecular Biology – Springer (in press).
3. Functional characterization of drought-responsive modules and genes in Oryza sativa: a network-based
approach, Sanchari Sircar and Nita Parekh, Front. Genet. 6: 256 (2015).
link
Presentations in Conferences/Workshops:
1. Meta-analytic Study of Drought-Tolerant Rice Genotypes: A Systems-based Approach, Sanchari Sircar and
Nita Parekh, poster presentation in the International Conference on Statistics & Big Data Bioinformatics,
20-23 Nov, ICRISAT, Hyderabad (2016).
2. Meta-analysis of Drought-Tolerant Rice Genotypes: A Network-based Approach, Sanchari Sircar and
Nita Parekh, oral presentation at the 7th Edition of YRLS Conference, 18 - 20 May, Institut Pasteur,
Paris, France (2016).
3. Co-Expression Network Analysis of Rice under Drought Stress: Identifying Functional Modules and Genes,
Sanchari Sircar and Nita Parekh, poster presentation in the Indo-French seminar on ''Women in Science''
through CEFIPRA, 3-5 Feb, IISC, Bangalore (2015).
4. Gene Co-expression Network Analysis of Oryza sativa under Abiotic Stress, Sanchari Sircar and
Nita Parekh, poster presentation in the Symposium on Accelerating Biology, 18-20 Feb, CDAC, Pune (2014).
Various studies have shown the association of genomic variants to rare and genetic diseases, including cancer,
by inducing functional changes in genes and regulatory regions. These variants include sequence variants, viz.,
SNVs and small indels and structural variants (SVs), viz., duplications, deletions, inversions and translocations.
With the advent of next generation sequencing technology, there is now considerable interest in understanding
the role of genomic variants in the underlying molecular mechanisms in pathogenesis. Copy-number variations
(CNVs) are a form of SVs that lead to abnormal copies of large genomic regions (50 bp - 1 Mbp) in a cell.
The importance of CNVs is recognized by their high prevalence in human genome (~10%) and the observation that
approximately half of the reported CNVs overlap with protein-coding genes. Single nucleotide variations
(SNVs) and small indels (~ 2-50 bp) are known to play an equally significant role in influencing the human
trait and contribute to disease. In our group we are interested in the identification and analysis of these
genomic variants. We have developed an integrated pipeline with a modular framework, SVINGS (link), for the
identification and analysis of CNVs using NGS data. We are in the process of developing separate modules for
the detection of small indels and SNVs, which will be integrated in SVINGS. We are currently investigating the
role of the genomic variants in cancer pathogenesis and their potential in identifying biomarkers.
Related Publications:
1. Copy Number Variation Detection Workflow using Next Generation Sequencing Data, Prashanthi Dharanipragada
and Nita Parekh, in IEEE proceedings of International Conference on Bioinformatics and Systems Biology (BSB-2016)
4-6 March 2016, IIIT Allahabad.
link
Presentations in Conferences/Workshops:
1. SVINGS: Structural Variants Identification in Next Generation Sequence data, Prashanthi Dharanipragada,
Sriharsha Vogeti and Nita Parekh, oral presentation at 8th Edition of Young Researchers in Life Science
(YRLS) Conference, 15-17 May, Institut Imagine, Paris, France (2017).
2. Copy Number Variation Detection Workflow using Next Generation Sequencing Data,
Prashanthi Dharanipragada and Nita Parekh, oral presentation in International Conference on Bioinformatics
and Systems Biology (BSB-2016) on 4th-6th March 2016, IIIT, Allahabad.
3. Functional Analysis of Copy Number Variations in DLBCL Pathogenesis, Prashanthi Dharanipragada and
Nita Parekh, poster presented in Accelerating Biology 2016 - Decoding the Deluge, January 19-21, CDAC
Pune (2016).
4. Detection of Copy Number Variations from Next Generation Sequencing Data, Sriharsha Vogeti,
Prashanthi Dharanipragada, Anwesha Mohapatra, Shanta Pendkar and Nita Parekh, poster presentation in
International Conference on Systems Biology (ICSB), Nov 23-24, Biopolis, Singapore (2015).
5. Copy Number Variation Analysis of Diffuse Large B-Cell Lymphoma (DLBCL) Subtypes, Prashanthi Dharanipragada
and Nita Parekh, poster presentation at 19th International Conference on Research in Computational Molecular
Biology (RECOMB), 12-15 April, Warsaw, Poland (2015).
6. Copy Number Variation Analysis of Diffuse Large B-Cell Lymphoma (DLBCL) Subtypes, Prashanthi Dharanipragada
and Nita Parekh, poster presentation at Big Data Analysis and Translation in Disease Biology, Indo-US Bilateral
Conference-cum-Workshop, 18-22 January, JNU, New Delhi (2015).
The metabolome represents the collection of all metabolites in a biological cell, tissue, organ or
organism that are the end products of cellular processes and a systematic study of these small molecules
is called metabolomics. We have developed an integrated web-based platform, Computational Core of Plant
Metabolomics (CCPM) that provides data repository,
analysis and visualization of mass spectral data. It provides an end-to-end analysis of LC/GC-MS data
involving raw data capture, data pre-processing, data pre-treatment, statistical and pathway analysis,
with option for customization of parameters from the web interface.
The metabolic network, a complex network including all metabolites and enzyme catalyzed reactions
occurring within a living cell, as well as the interactions between the reactants and enzymes, is an abstract
representation of cellular metabolism. The topology of metabolic networks reflects the dynamics of their formation
and evolution and graph theory have proved to be useful in such analysis. Graph centrality measures are useful
in identifying important metabolites and enzymes and modularity measures to identify pathways conserved over
evolution. We are presently carrying out graph-based analysis of substrate-centric and enzyme-centric metabolic
networks of Arabidopsis thaliana.
Related Publications:
1. Construction and Analysis of Enzyme Centric Network of A. thaliana using Graph Theory,
Kasthuribai Viswanathan and Nita Parekh, in A. N. Averkin, D. I. Ignatov, S. Mitra, J. Poelmans,V. B. Tarasov (Eds.),
proceedings of International Workshop on Soft Computing Applications and Knowledge Discovery (SCAKD’11),
125-134, (2011). ISSN:1613-0073.link
2. Construction and Analysis of Metabolic Network of Arabidopsis thaliana Pathways, Kasthuribai Viswanathan
and Nita Parekh, in proceedings of The 12th International Conference on Bioinformatics & Computational Biology
(BIOCOMP'11), Ed. Hamid R. Arabnia, Quoc-Nam Tran, 367-372, (2011). ISBN:1-60132-172-4.
link
Presentations in Conferences/Workshops:
1. CCPM V3.4: Towards Collaborative Metabolomics, I. Ghosh, A. Mitra, Nita Parekh, V. Pudi,
B. Chakrabarty, P. Dharanipragada, R. Gurrapu, K. Narendra Babu, S. Manoj Kumar, S. R. Kiran Raj, M. Manoj Kumar,
V.P. Srivani, V. Dharma Teja, poster presentation in 3rd International Plant Physiology Congress (IPPC-2015),
Dec 11-14, JNU, New Delhi (2015).
2. CCPM V3.4: Towards Collaborative Metabolomics, I. Ghosh, A. Mitra, Nita Parekh, V. Pudi,
B. Chakrabarty, P. Dharanipragada, R. Gurrapu, K. Narendra Babu, S. Manoj Kumar, S. R. Kiran Raj, M. Manoj Kumar,
V.P. Srivani, V. Dharma Teja, poster presentation in 7th Annual Meeting of Proteomics Society - India (PSI),
Dec 3-6, VIT University, Vellore (2015).
3. Comparative Analysis of Metabolic Networks, Shubhi Gupta and Nita Parekh, poster presentation at International
Conference on Frontiers of Interface between Statistics and Sciences, CR Rao AIMSCS, Hyderabad, 30 Dec 2009-2 Jan 2010.