casestudy

Advanced Data Analytics with AutoML integration on Vertex AI

My Role: Data Analyst, AI Strategist

Deliverables: Data Preprocessing, Feature Engineering, Model Deployment and Evaluation

Project Specs: Jupyter + Python, Google Cloud, Vertex AI

Duration: 2 weeks

Project Overview:

Objective: To leverage advanced data analytics techniques and machine learning to develop predictive applications that enhance decision-making.

Context: This capstone project is part of the Google Advanced Data Analytics program, aimed at demonstrating the practical application of integrated data analytics and ML technologies on Google Cloud’s Vertex AI platform.

Goal: Use Vertex AI to apply machine learning for more accurate predictions and automated analytics.

Data Collection: Provided by Google

Tools and Technologies: Used Python for data cleaning and preprocessing, and Google Cloud Storage for hosting the dataset

Data Sources:

Data Cleansing and Preparation: Standardized data formats, handled missing values, and removed outliers to prepare a clean dataset.

Exploratory Data Analysis (EDA): Conducted thorough analyses to explore patterns and relationships in the data.

Feature Engineering: 

  • Initial Focus: The primary approach for this project involves deriving features from the pickup datetime to focus on trip duration and time-related aspects, which are critical for understanding patterns in customer usage and fare estimation.

  • Future Enhancements: To further enhance the model’s accuracy and complexity, additional data points such as latitude, longitude, and holiday schedules should be considered. Incorporating these variables could significantly refine fare predictions by accounting for geographical influences and special timing considerations.

Methodology:

Scatter plot: relationship bet Trip Distance and Fare Amount with Outlier
Scatter plot: relationship bet Trip Distance and Fare Amount with Outlier Removed

Implementing Machine Learning on Vertex AI

Model Selection: Upon specifying a regression task, Vertex AI’s AutoML automatically selected the best model for predicting outcomes based on fare, utilizing temporal features such as trip duration and time-based elements to enhance accuracy

Model Training and Deployment: The chosen model was trained and deployed automatically by Vertex AI’s AutoML after identifying regression as the task type. This included optimizing the model using a dataset enriched with temporal features, ensuring an efficient and scalable training environment.

Results and Impact

Evaluation Metrics:

Mean Absolute Error (MAE): 2.016 - Indicates a reasonable error level if fare amounts typically range between $10 and $100, but could be significant if fares are lower.

Mean Absolute Percentage Error (MAPE): 56.333% - Suggests that the model's predictions deviate significantly from actual values, indicating poor performance.

Root Mean Square Error (RMSE): 5.86 - Reflects potentially large errors, particularly if fare amounts are not high.

Root Mean Squared Logarithmic Error (RMSLE): 0.233 - Shows some discrepancies in predictions, especially for larger fare values.

Contextual Analysis: The high MAPE suggests that the model is not accurately predicting fares, which could stem from an overly simplistic model or insufficient feature representation. Both MAE and RMSE suggest that the model's predictions have moderate inaccuracies, which may be problematic depending on the typical fare range.

Lessons Learned and Reflections

This project really opened my eyes to how machine learning could work in the real world. It reinforced just how crucial clean, unbiased data is for any successful ML project. Diving deep into data management, preprocessing, and tuning not only boosted my skills but also made me really appreciate the precision needed in this field. And thank goodness for ChatGPT when I needed help with Python!

Working with Vertex AI was a game-changer. It showed me the power of Google Cloud's tools to handle machine learning models at scale. The platform could be particularly effective for prototyping and launching small AI features, allowing for innovation and learning without substantial upfront investment. I'm super excited to see what comes next.

Merging machine learning with data analytics didn't just up my technical game; it also showed me how vital these tools are for making smart business decisions and foster innovative solutions.

I'm looking forward to pushing the boundaries of what I can do with ML in different industries and keeping up with all the rapid changes in this space.