top of page
Search

SQL and Excel

  • Writer: Hamza Khaoua
    Hamza Khaoua
  • Mar 4, 2021
  • 3 min read

Second project in my General Assembly Data analyst course:



Background

I am going to be looking at bike sharing organizations. (For a background on bicycle

sharing systems: https://en.wikipedia.org/wiki/Bicycle-sharing_system ). There are six

bike sharing organisations (in different countries) that I have data for.

● Baywheels (San Francisco Bay area)

● Blue bikes (Boston)

● Capital Bike share (Washington DC, Arlington and Alexandria)

● Citibike (New York)

● Divvy Bikes (Chicago)

● Santander (London)


Questions chosen:

Question 1:

a) Easy questions: How many trips were there in each month of each year?

b) Moderate questions: Is the subscription side of these businesses growing? (In absolute terms; and also, as a proportion of the sharing).

c) Harder questions: Is there a difference in growth between holiday activity and commuting activity?

Question 2:

a) Easy questions: What was the longest journey? What do we know about it?

b) Moderate questions: How often do bikes need to be relocated?

c) Harder questions: How far is a typical journey?


Answers:

Question 1:

a) Easy questions: How many trips were there in each month of each year?


with bluebikes_Join as ( select * from bluebikes_2017

UNION ALL

select * from bluebikes_2018

UNION ALL

select * from bluebikes_2019)

select count(*) as trips_bluebike, date_part('month', start_time) as month,date_part('year', start_time) as year from bluebikes_Join

group by date_part('month', start_time),date_part('year', start_time)

order by year,month asc


b) Moderate questions: Is the subscription side of these businesses growing? (In absolute terms; and also, as a proportion of the sharing).



with bluebikes_Join as ( select * from bluebikes_2017

UNION ALL

select * from bluebikes_2018

UNION ALL

select * from bluebikes_2019),

bluebikes_table as (select count(user_type) as total_users,user_type,date_part('year',start_time) as year,date_part('month', start_time) as month from bluebikes_Join

group by date_part('year', start_time),date_part('month', start_time),user_type)

select * from bluebikes_table

order by year,month asc


c) Harder questions: Is there a difference in growth between holiday activity and commuting activity?



with bluebikes_Join as ( select * from bluebikes_2017

UNION ALL

select * from bluebikes_2018

UNION ALL

select * from bluebikes_2019),

bluebikes_table as (select distinct count(user_type) as total_users,date_part('year',start_time) as year,date_part('month', start_time) as month from bluebikes_Join

where user_type = 'Subscriber'

group by date_part('year', start_time),date_part('month', start_time))

select * from bluebikes_table

order by year,month asc


Question 2:

a) Easy questions: What was the longest journey? What do we know about it?


with L1_bbikes_2019 AS

(SELECT

b.bike_id,

b.start_time,

b.start_station_id,

b.end_station_id,

s.latitude as ss_latitude,

s.longtitude as ss_longitude

FROM bluebikes_2019 b

JOIN bluebikes_stations s ON b.start_station_id = s.id

WHERE b.start_station_id != b.end_station_id)

SELECT

max(calculate_distance(b.ss_latitude, b.ss_longitude,

e.latitude, e.longtitude,

'K')),

date_part('month',start_time) As month

FROM L1_bbikes_2019 b

JOIN bluebikes_stations e ON b.end_station_id = e.id

Group by month

Order by month


b) Moderate questions: How often do bikes need to be relocated?



WITH bike_cte AS (SELECT Distinct bike_id, start_time, start_station_id as start, end_station_id as stop

FROM bluebikes_2019

ORDER BY bike_id,start_time),

delay_cte AS (SELECT bike_id, start_time, start, LAG(stop, 1) OVER( Partition BY 1) as previous_stop FROM bike_cte),

moved_bike AS (SELECT bike_id, start_time, start, previous_stop, start!=previous_stop AS Moved FROM delay_cte),

total_rides AS (SELECT date_part('month',start_time) AS month, count(*) AS number_of_rides FROM bluebikes_2019

GROUP BY month

ORDER BY month),

total_moves AS (SELECT date_part('month',start_time) AS month, count(CASE WHEN moved THEN 1 END) AS number_of_moves

FROM moved_bike

GROUP BY month

ORDER BY month)


SELECT

m.month,

number_of_moves,

number_of_rides

FROM total_rides

JOIN total_moves m USING(month)


c) Harder questions: How far is a typical journey?


WITH

bluebikes_Join as ( select * from bluebikes_2017

UNION ALL

select * from bluebikes_2018

UNION ALL

select * from bluebikes_2019),

L1_bbikes_2019 AS

(SELECT

b.bike_id,

b.start_time,

b.start_station_id,

b.end_station_id,

s.latitude as ss_latitude,

s.longtitude as ss_longitude

FROM bluebikes_Join b

JOIN bluebikes_stations s ON b.start_station_id = s.id

WHERE b.start_station_id != b.end_station_id)

SELECT

AVG(calculate_distance(b.ss_latitude, b.ss_longitude,

e.latitude, e.longtitude,

'K')),

date_part('month',start_time) As month,

date_part('year',start_time) As year

FROM L1_bbikes_2019 b

JOIN bluebikes_stations e ON b.end_station_id = e.id

Group by year,month

Order by year,month

 
 
 

Recent Posts

See All

Comentários


Post: Blog2_Post

©2021 by Hamza Khaoua. Proudly created with Wix.com

bottom of page