Queen's University
This project is maintained by EdRey05
Most of the tools developed are described below, and include a GIF preview + a link to the script/notebook (click on the title). The majority of these tools were designed to run within the ImageJ/Fiji software, which our lab commonly uses for data analysis, plus some notebooks made in Google Colab for additional tasks. These tools do not require any programming experience nor installing any other program (other than ImageJ) to be used.
For more information on how to use the scripts, check: Tutorials.
NOTE: If you are reading this on Github, check out the Github page for easier visualization. If you are already in the Github page (has colours), you can click on the button “View on Github” from the left box to access the main repository.
Context:
The EVOS M7000 imager takes pictures of a field of view (FOV) at different heights (Z-slices) and in different colours (channel blue, green, red…). However, it does not save the pictures as hyperstacks (a single file containing all slices and colours for each FOV). Instead, the equipment separately saves each slice for each colour, resulting in 5, 20 or more TIF files from the same FOV and in grayscale. We can put the images back together using the ImageJ software, since we do image analysis with it and it can run macros/scripts in languages including Jython (Python implementation to run in Java).
Problems:
Solution:
os
Python library to scan all the files of a folder, get the image names and extract the relevant information with string and path operations. Then I applied a special sorting to cluster together all TIFs of the same FOV.IJ
and ImagePlus
from the ij
library. This required logic controls to identify when to stack slices, when to merge all stacks into a composite, and when to save the hyperstack.Preview of the script:
Context:
In Proximity Ligation Assays (PLA) we quantify the number of red puncta/dots in blue+green+red images, which reflects the interactions of two proteins. We make a region of interest (ROI) around individual cells to quantify the interactions per cell, and make a bigger ROI for representative figures or data presentation. In my PLA experiments, I quantified around 10,000 individual cells using the ImageJ software and generated multiple tools described below to automate the saving and opening of ROIs.
Problems:
Solution:
Roi
and RoiManager
modules from the ij
library to get the content of the ROI manager into a variable I can iterate through.Description: This tool saves in a temporary folder the two sets of ROIs as described above. It has no requirements other than an opened image with all the ROIs added to the manager, but the products have to be manually moved to their appropriate destination before doing the next image.
Preview of the script:
Description: This tool is split in two sections, the top part is for making the rectangular ROIs and the jpg preview (as mentioned in Tool 01), and the second part is for making the ROIs with the polygon tool. The idea behind the split parts is that the experimenter can pre-select the cells they want analyzed by doing the rectangular ROIs, and then someone else can take the jpg images to make the polygons, which is the time-consuming step.
Preview of the script:
Description: This tool opens the two sets of ROIs that were drawn for an image. It has no requirements other than having a folder structure as described above (Processed…, ROIs). By openning a processed image (MAX projection TIF), the script can load the ROIs in case it is needed to review, delete, or add more ROIs (then just run tool 01).
Preview of the script:
Context:
We quantify the PLA interactions (red dots) in our images using the ImageJ software and the regions of interest (ROIs) generated with the tools described above. For my experiments, I wanted to test two ways of quantifying the objects in my images: 1) Applying a threshold method from the ImageJ tools (Image- ->Adjust–>Threshold) or 2) Using the Find Maxima tool on ImageJ (Process–>Find Maxima). Both methods are followed by object counting with the Analyze Particles tool on ImageJ (Analyze–>Analyze Particles), and I wanted to also measure the area of the cells in case normalization was required (Analyze–>Measure).
Problems:
Solution:
ij
library as the tools to handle ROIs (see above), additionally, I implemented the module ResultsTable
which was very important to handle the pop-up windows with the results.Preview of the script:
Context:
Once I’ve used the script shown above to quantify the Proximity Ligation Assay (PLA) puncta, I wanted to put the cropped images and results into a PowerPoint presentation to visualize easily and quickly the 100-500 cells I quantified per condition (total ~10,000). I also wanted to have two presentations with both quantification methods tested (see script above) to determine which was more appropriate for my experiments.
Problems:
Pandas
and Numpy
do not work there.Solution:
python-pptx
which has been recently developed (functionalities limited but expanding) and allows for the creation of PowerPoint presentations from Python code.Preview of the script:
Context:
The Broad Institute and Novartis published in 2019 huge datasets resulting from a collaboration to make available distinct measurements (RNA, metabolites, mutations, etc.) of a panel of over 1500 human cancer cell lines. In our lab, (Mulligan) we have multiple cancer cell lines for which we wanted to get the RNASeq data to do some exploratory studies looking for insights on the expression levels of specific proteins. The data can be found either in cBioPortal for Cancer Genomics or directly from the CCLE website.
UPDATE: A second notebook was generated for the most recent release of the dataset (now integrated to a bigger project called DepMap) and can be found HERE. It works the same as the preview and tutorial of the first notebook (shown here).
Problems:
Solution:
pandas
dataframe operations and user inputs to make a search tool, and extract only the required columns (all genes/rows) into a .csv file.Preview of the script:
Context:
A short project looking at breast cancer data available in the cBioPortal for Cancer Genomics server was carried out. The study used the METABRIC dataset published in Nature journals (2012 and 2016) which has just over 2500 tumour samples. The aim of the project was to evaluate survival of patients through Kaplan-Meier (KM) plots and correlate them with expression levels of pairs of proteins (the RET receptor + ~50 hints we got from synthetic lethality assays). Our hypothesis was that the survival of a patient should increase when RET and any other of the hints were expressed at low levels in the patient, partially mimicking the concept of synthetic lethality (less expression of the pair of proteins –> tumour cells die or not proliferate as much –> the patient lives longer).
Problems:
Solution:
KaplanMeierFitter
module from the lifelines
library, also retrieving key data like time to 50% survival for all subgroups.Preview of the script:
UPDATE:
Since I gained great interest for this type of analysis, I kept learning on my own for 1-2 years after the small projects finished. Then, I found an approach to generalize the data pre-processing and analysis and created a second tool that used ipywidgets
to interactively get user input to visualize and select the variables and ways to subdivide the dataset before plotting the KM curves. This second tool, was significantly more complex but allowed me to easily explore different combinations of variables and subgroups to gain insights about this breast cancer study. Although the 2nd tool did not require the user to modify any code blocks, the proper rendering of the widget layout I used would not work in Google Colab and thus the user should know how to install and work in Jupyter lab. For that reason and some difficulties with widget behavior, I stopped that project (Version 04).
Preview of the interactive version: