We had a lot of audiogram data from Mass Eye and Ear that had been manually entered data into excel spreadsheets. It was difficult to read, and each data set had another set of images attached to it with an unknown linkage. Additionally, there was patient data that needed to be sorted through, and multiple versions of all of this data that needed to be quality checked.
1. Create a database
2. Quality check the existing data
3. Figure out how to link images to the dataset
4. Prototype a web application
5. Develop the application!
Whiteboarded and setup a database using PostgreSQL and hosted it on AWS. This process involved a lot of iteration because requirements with scientists would often change as a result of meetings and experiments.
Cross validated the data using Python to check for missing information and incorrect information.
We had over 7.5 terabytes of images that stood in an S3 bucket that were unlinked to audiograms. We set up an image management platform called OMERO to store the images and link the images based on tray-numbers to the appropriate datasets.
I mainly prototyped in Adobe XD. This involved setting up meetings with scientists to figure out how best to implement this web application. Scientists wanted a visual representation of the audiograms, as well as a way to search through patient notes and other criteria.
I mostly worked on the front-end of this project, using React, HTML and CSS. The back-end of this project is Node.js and PostgreSQL.
Decibel is a biotech company with many studies being conducted. Currently, each team stores their study information in different places and in different ways. There is a lack of consistency and searchability which leads to less transparency in the organization. There is also a lot of manual data entry that can be automated to produce more accurate results.
Study Tracker: an in house study tracking software that will be a key part of the Decibel Data ecosystem. It will be the foundation stone on which scientific questions are answered. Study Tracker has a lot of features like being able to assign a unique ID to each study, integration with other technologies used throughout the company, applying metadata from controlled vocabularies to each study and an easily searchable interface to find those studies. I primarily worked on the metadata component for Study Tracker.
1. Controlled vocabulary (ex. Genes, Viruses, Plasmids)
2. Data comes from different sources (ex. NCBI, ChemAxon, etc.)
3. Studies need to be tagged
4. Images need to be tagged
5. This ecosystem needs to be searchable
6. Metadata component must be reusable
I mainly prototyped in Adobe XD. For this particular component I did a lot of research of other scientific websites and how they implement searching for metadata. I foun that the component needed to contain long textual sentences so I had to design keeping that in mind. The first version I created was too complicated and had too much information. Then, I decided to make the search component in two parts. The person must first filter by type of metadata, and then they can search for the specific data point. This makes it less confusing for the user and through user testing I found that users found what they were looking for more quickly.
1. Elastic Search
An open-source, RESTful, distributed search and analytics engine. I used Elastic Search to index the data coming from various API's. I wrote python scripts to sort through the data from various inventories. Elastic Search then stores this data with appropriate unique id's and also adds a searchable reference to each document.
2. React
I used React to build the actual search component. I did a lot of styling based on Decibel guidelines to make the component sleeker.