Note: This project is currently in development.
Link to the project: https://github.com/jmdtanalyst/MSC_DA_CA2_Transport_Ireland
Report: MSC_DA_CA2_Jose_Mario.docx
This project aims to analyze and predict bike service trends in Dublin and Washington D.C. It employs the CRISP-DM framework and delves into four primary datasets. The data was acquired from trusted sources like data.gov.ie and Capital Bikeshare's official website, and reviews were sourced from TripAdvisor and Reddit through APIs. The datasets were cleaned and transformed to produce notable findings, including the distribution of bike stations, statistical analyses, and machine learning models.
Project Description
For this project, the CRISP-DM framework was applied to ensure a structured approach to data analysis.
Data Preparation and Visualization
1. Data Acquisition:
DublinBikes: Data collected from data.gov.ie.
Capital Bikeshare: Data sourced from Capital Bikeshare's official website.
Reviews: Data sourced from TripAdvisor and Reddit through APIs.
2. Data Cleaning and Engineering:
Datasets underwent cleaning, duplication removal, and null value handling. Statistical sampling was applied for efficient analysis.
3. Exploratory Data Analysis (EDA):
Visualizations were created using Plotly Express to analyze bike station distributions, trips by weekday, and sentiment analysis results.
Machine Learning for Data Analysis
Time Series Analysis:
Time-series analysis was applied using RandomForestRegressor, Linear Regression, and Ridge Regression algorithms.
Sentiment Analysis:
VADER was used for sentiment classification, achieving notable accuracies.
Conclusion
This project offers valuable insights into bike-sharing systems in Dublin and Washington D.C., providing data-driven analyses and predictions that can inform urban transportation planning and policy-making.