Romina Caraba Data Scientist

My Expertise

Hi there! I’m Romina Caraba-Thampi, a Data Scientist with a background in machine learning, backend development, and analytics. I currently work at Equitable, where I collaborate closely with underwriting teams to build predictive models and data-driven tools using Python and SQL. My work focuses on enhancing underwriting decision systems through automated daily and monthly reporting pipelines, reducing manual workload and enabling faster, more informed risk assessments. Previously, I worked at Metrolinx, where I developed predictive models and analytics solutions to optimize transportation systems and support operational planning. With over five years in tech, my experience spans data science, software engineering, and teaching. I’ve worked with Python, SQL, Go, R, and PHP, along with cloud and analytics tools such as AWS, GCP, Power BI, and Docker. Whether it’s building scalable data pipelines, designing machine learning models, or creating intuitive dashboards, I specialize in transforming complex data into clear, actionable insights that drive real-world impact. Beyond my day-to-day work, I’m passionate about mentoring and supporting others in tech. I offer resume reviews, GitHub portfolio optimization, and mock interviews—helping individuals break into the industry with confidence.

Data Science & Programming

Focus: Predictive modeling & forecasting

Building predictive models, data pipelines, and analytical solutions.

Data & Databases

Focus: SQL optimization & data modeling

Working with structured data, querying, and designing efficient schemas.

Cloud & Infrastructure

Focus: Scalable data pipelines & deployment

Deploying and managing cloud-based data and machine learning systems.

Machine Learning & Analytics

Focus: Decision systems & business insights

Designing models, APIs, and dashboards that drive real-world impact.

Tools & Workflow

Focus: Collaboration & production workflows

Version control, development environments, and team collaboration.

Featured Projects

space_object

Udacity Course: Near Earth Object Detection

  • Machine Learning, Python, Data Analysis, Pandas, JSON

In March 2022, I completed a nanodegree course in Intermediate Python from Udacity. One of the projects' goal was to search and explore near earth objects using data from using data from NASA/JPL's Center for Near Earth Object Studies. This project is based on the Udacity nanodegree found here.

Check it out
dog_meme

Udacity Course: Meme Generator

  • Python, Interactive Web Interface, Numpy, Pandas, Flask

In March 2022, I completed a nanodegree course in Intermediate Python. One of the projects's goal was to create a meme generator. Given a set of images, and a list of quotes, generate a meme. See the course info at Udacity Intermediate Python".

Check it out
covid

Canadian COVID-19 Data Study and Prediction

  • Data Analysis, Python, Jupyter Notebook

During the month of May 2020, I had analysed the data from the COVID-19 published by the Government of Canada. The data file can be found on the website, under “Current situation”. I had used Atom to write a Python code which contained a SEIR model (Susceptibility, Exposed, Infected, Recovered). To build this model I used a Runge-Kutta method of 4th degree order in which the four dimensions were SEIR. Then, I used a linear regression model to predict the data in June by studying the data from May.

Check it out
cancer_detection

Cancer Detection Project - Image Recognition

  • Machine Learning, Python, Data Analysis

In summer 2019, I have worked on a capstone project for the completion of a Data Science diploma from Brainstation. For this project I have developed my own convolutional neural network algorithm using imaging data of lymph node sections. The model was 80-87% accurate which sufficient for the given circumstances. The data was provided by the Patch Camelyon Challenge .

Check it out
toronto

Trend Analysis of Crimes in Greater Toronto Area (GTA)

  • Data Analysis, R, Tableau, Python

For a summer school project, I analyzed data publicly-provided by the Toronto Police on Crime rates. Interestingly enough, the highest crime hours are at 12 am and 12 pm respectively. Regarding the areas are the most “dangerous” in GTA, please refer to the Jupyter Notebook below.

Check it out
iss

Canadian Space App Challenge

  • Data Analysis, Data Cleaning, Excel, Python

In November 2019 I participated in a hackathon organized by the Canadian Space Agency (CSA) and National Aeronautics and Space Administration (NASA). My team won the challenge for our project on the analysis of neutron radiation. We compared the data collected from the International Space Station (ISS) and from schools across Canada. A lot of data had missing values, or incorrectly collected answers. Instead, we replaced the missing values with predicted approximated data containing median of schools around the area.

Check it out