Melanie Bahlo-Projects

Melanie Bahlo-Projects

Projects

In-silico gene prioritisation using brain specific gene expression data

Following on from our published research into epileptic encephalopathy genes, we have been applying data cleaning methods to several large, complex, brain specific gene expression data sets. Removal of artifacts from these precious data sets will allow us to use the data to prioritise discovered variants in gene discovery projects. These genes can be taken forward in collaborative studies for further examination.

Team members: Dr Saskia Freytag, Vesna Lukic, Karen Oliver

Reference: Oliver KL, Lukic V, Thorne NP, Scheffer I, Berkovic S, Bahlo M. Harnessing gene expression networks to prioritize candidate epileptic encephalopathy genes. PLoS One. 2014 Jul 9;9(7):e102079. PMID: 25014031

Comparison of different microarray data cleaning methods in the context of gene-gene correlations using simulated data.
Comparison of different microarray data cleaning methods in the context of gene-gene correlations using simulated data. The simulated data consisted of 1000 arrays with 3000 measured gene expressions obscured by moderate noise. In each panel the correlations above the diagonal represent the true underlying correlations between the genes, whereas the correlations underneath the diagonal represent the estimated correlations from the data treated with a particular cleaning procedure. The first panel shows correlation estimated from the untreated data, the second panel shows the effect of removal of unwanted estimation procedure using 2000 negative control genes. The last two panels focus on commonly applied methods such as background correction and background correction plus quantile-normlization. In each panel the first six genes  are strongly expressed genes, the second set of six genes have low expression and the last six genes are not expressed and are thus uncorrelated. It is apparent that removal of unwanted variation is the only procedure able to recover the true gene-gene correlations in noisy data.

 

Analysis methods for cell-free DNA for the detection of foetal anomalies and transplant rejection

Following on from our published research into plasma DNA sequencing we have been developing and applying refined methods for the description and correction of read coverage bias in next-generation sequencing data. These corrections lead to more sensitive detection of changes in cell-free DNA profiles, allowing the detection of foetal genomic abnormalities and potentially allowing the detection of transplant rejection, based on DNA extracted from blood samples.

This plot depicts the result of a cross-correlation analysis on the sequencing reads of two cell-free DNA samples.
This plot depicts the result of a cross-correlation analysis on the sequencing reads of two cell-free DNA samples. This pattern is evidence that DNA fragments occur in regularly placed clusters along the genome. The interval length of ~190bp between the correlation peaks corresponds to the distance between nucleosomes. 

 

Team members: Dineika Chandrananda, Peter Diakumis

Reference: Chandrananda C, Thorne NP, Ganesamoorthy D, Bruno DL, Benjamini Y, Speed TP, Slater HR, Bahlo M. Investigating and correcting plasma DNA sequencing coverage bias to enhance aneuploidy discovery. PLoS One. 2014 Jan 29;9(1):e86993. PMID: 24489824

Discovery of expanded repeats with whole genome sequencing data

We know of over twenty neurological diseases caused by expansions of short repetitive runs of DNA. This includes several causes of ataxia and Huntington’s disease. It is very likely that there are other neurological diseases also caused by repeat expansions, however these are very difficult to discover due to their repetitive nature, requiring new methods to identify. We are developing tools to scan through databases of known short repeat loci to identify individuals that show evidence of expansions and will apply these to cohorts of unsolved patients with genetic disorders.

Team member: Rick Tankard

Identification of identity by descent relationships with DNA data

Identity by descent (IBD) describes the genetic relationship between individuals. IBD can be used to infer hidden relationships. These relationships, when found, form the basis of many of our disease variant discovery methods. We have implemented and extended these methods to apply to X chromosome and next-generation sequencing data. This has led to the discovery of disease causing variants in intellectual disability and we are extending these methods for cohort IBD discovery as well as inherited copy number variant discovery.

Team members: Lyndal Henden, Dr Thomas Scerri

Detection of an IBD tract on the X chromosome in a pair of supposedly unrelated individuals, determined using posterior probability and the Viterbi algorithm.
Detection of an IBD tract on the X chromosome in a pair of supposedly unrelated individuals, determined using posterior probability (dots) and the Viterbi algorithm (solid line). The dotted line in the graph shows the location of a gene in which a novel, identical mutation was found in the pair that is suspected casual for their X-linked intellectual disability.