SQL and Excel
- Hamza Khaoua
- Mar 4, 2021
- 3 min read
Second project in my General Assembly Data analyst course:
The project guidelines can be found in this link : https://drive.google.com/file/d/1RU1FQlcqxkjj_oMhAQIhSb-DR2Qa0CIH/view?usp=sharing
Background
I am going to be looking at bike sharing organizations. (For a background on bicycle
sharing systems: https://en.wikipedia.org/wiki/Bicycle-sharing_system ). There are six
bike sharing organisations (in different countries) that I have data for.
● Baywheels (San Francisco Bay area)
● Blue bikes (Boston)
● Capital Bike share (Washington DC, Arlington and Alexandria)
● Citibike (New York)
● Divvy Bikes (Chicago)
● Santander (London)
Questions chosen:
Question 1:
a) Easy questions: How many trips were there in each month of each year?
b) Moderate questions: Is the subscription side of these businesses growing? (In absolute terms; and also, as a proportion of the sharing).
c) Harder questions: Is there a difference in growth between holiday activity and commuting activity?
Question 2:
a) Easy questions: What was the longest journey? What do we know about it?
b) Moderate questions: How often do bikes need to be relocated?
c) Harder questions: How far is a typical journey?
Answers:
Link to excel file and the presentation(mp4) : https://drive.google.com/drive/folders/17-tvD4jzrFPY9LO1VjHF6tc2wlbPAYMQ?usp=sharing
Question 1:
a) Easy questions: How many trips were there in each month of each year?

with bluebikes_Join as ( select * from bluebikes_2017
UNION ALL
select * from bluebikes_2018
UNION ALL
select * from bluebikes_2019)
select count(*) as trips_bluebike, date_part('month', start_time) as month,date_part('year', start_time) as year from bluebikes_Join
group by date_part('month', start_time),date_part('year', start_time)
order by year,month asc
b) Moderate questions: Is the subscription side of these businesses growing? (In absolute terms; and also, as a proportion of the sharing).

with bluebikes_Join as ( select * from bluebikes_2017
UNION ALL
select * from bluebikes_2018
UNION ALL
select * from bluebikes_2019),
bluebikes_table as (select count(user_type) as total_users,user_type,date_part('year',start_time) as year,date_part('month', start_time) as month from bluebikes_Join
group by date_part('year', start_time),date_part('month', start_time),user_type)
select * from bluebikes_table
order by year,month asc
c) Harder questions: Is there a difference in growth between holiday activity and commuting activity?

with bluebikes_Join as ( select * from bluebikes_2017
UNION ALL
select * from bluebikes_2018
UNION ALL
select * from bluebikes_2019),
bluebikes_table as (select distinct count(user_type) as total_users,date_part('year',start_time) as year,date_part('month', start_time) as month from bluebikes_Join
where user_type = 'Subscriber'
group by date_part('year', start_time),date_part('month', start_time))
select * from bluebikes_table
order by year,month asc
Question 2:
a) Easy questions: What was the longest journey? What do we know about it?

with L1_bbikes_2019 AS
(SELECT
b.bike_id,
b.start_time,
b.start_station_id,
b.end_station_id,
s.latitude as ss_latitude,
s.longtitude as ss_longitude
FROM bluebikes_2019 b
JOIN bluebikes_stations s ON b.start_station_id = s.id
WHERE b.start_station_id != b.end_station_id)
SELECT
max(calculate_distance(b.ss_latitude, b.ss_longitude,
e.latitude, e.longtitude,
'K')),
date_part('month',start_time) As month
FROM L1_bbikes_2019 b
JOIN bluebikes_stations e ON b.end_station_id = e.id
Group by month
Order by month
b) Moderate questions: How often do bikes need to be relocated?

WITH bike_cte AS (SELECT Distinct bike_id, start_time, start_station_id as start, end_station_id as stop
FROM bluebikes_2019
ORDER BY bike_id,start_time),
delay_cte AS (SELECT bike_id, start_time, start, LAG(stop, 1) OVER( Partition BY 1) as previous_stop FROM bike_cte),
moved_bike AS (SELECT bike_id, start_time, start, previous_stop, start!=previous_stop AS Moved FROM delay_cte),
total_rides AS (SELECT date_part('month',start_time) AS month, count(*) AS number_of_rides FROM bluebikes_2019
GROUP BY month
ORDER BY month),
total_moves AS (SELECT date_part('month',start_time) AS month, count(CASE WHEN moved THEN 1 END) AS number_of_moves
FROM moved_bike
GROUP BY month
ORDER BY month)
SELECT
m.month,
number_of_moves,
number_of_rides
FROM total_rides
JOIN total_moves m USING(month)
c) Harder questions: How far is a typical journey?

WITH
bluebikes_Join as ( select * from bluebikes_2017
UNION ALL
select * from bluebikes_2018
UNION ALL
select * from bluebikes_2019),
L1_bbikes_2019 AS
(SELECT
b.bike_id,
b.start_time,
b.start_station_id,
b.end_station_id,
s.latitude as ss_latitude,
s.longtitude as ss_longitude
FROM bluebikes_Join b
JOIN bluebikes_stations s ON b.start_station_id = s.id
WHERE b.start_station_id != b.end_station_id)
SELECT
AVG(calculate_distance(b.ss_latitude, b.ss_longitude,
e.latitude, e.longtitude,
'K')),
date_part('month',start_time) As month,
date_part('year',start_time) As year
FROM L1_bbikes_2019 b
JOIN bluebikes_stations e ON b.end_station_id = e.id
Group by year,month
Order by year,month
Comentários