This project explores the application of Recurrent Neural Networks (RNNs) to predict daily bike-sharing trips. The primary goal is to develop and evaluate models that can accurately forecast the number of bike trips, which can be beneficial for bike-sharing companies to optimize their operations and resources.
You can find the code at: https://github.com/jmdtanalyst/Bike_Sharing_Trip_Prediction_RNN
Data Source: The dataset consists of daily bike-sharing trip records.
Data Ingestion: Utilized Apache Hadoop and PySpark to handle large datasets efficiently. CSV files were loaded into a Spark DataFrame and saved on the Hadoop Distributed File System (HDFS).
Data Cleaning and Transformation: The data was cleaned and transformed using PySpark, including converting timestamps to dates and aggregating trip counts by day.
Descriptive Statistics: Analyzed the central tendency, variability, and range of daily trips.
Distribution Analysis: Examined the distribution and skewness of total trips per day.
Visualization: Created histograms, box plots, and scatter plots to identify patterns and outliers.
Seasonality and Trend Analysis: Decomposed the time series data to extract trend, seasonality, and residual components.
Three models were developed and evaluated:
Simple LSTM: A basic Long Short-Term Memory (LSTM) model with a single LSTM layer.
LSTM with Hyperparameters: An enhanced LSTM model with multiple layers, dropout regularization, and hyperparameter tuning.
GRU Model: A Gated Recurrent Unit (GRU) model with a similar architecture to the enhanced LSTM.
Training: Models were trained using TensorFlow, with a look-back period of 1 day for input sequences.
Evaluation Metrics: Root Mean Squared Error (RMSE) was used to evaluate model performance on training and test datasets.
Results: The GRU model achieved the best performance with the lowest RMSE on the test set, indicating its robustness in capturing temporal dependencies.
Seasonal Patterns: The data exhibited clear seasonal patterns, with higher ridership in warmer months and weekdays.
Model Performance: While the GRU model performed slightly better, all models showed limitations in prediction accuracy, suggesting that additional features (e.g., weather data, holidays) could improve performance.
Future Work: Incorporating more contextual data and experimenting with different model architectures and hyperparameters could further enhance prediction accuracy.
Time Series Decomposition: Visualized the trend, seasonality, and residual components of the time series data.
Model Predictions: Plotted the actual vs. predicted trip counts for both training and test sets to illustrate model performance.
Data Processing: Apache Hadoop, PySpark
Modeling: TensorFlow, Keras
Visualization: Matplotlib, Seaborn
Programming Languages: Python
This project demonstrates the potential of RNNs for time series forecasting in the context of bike-sharing systems, highlighting both the strengths and areas for improvement in predictive modeling.