Building a data-driven public transportation route recommendation system for smart cities

In 5-10Years rural to urban movement going to create new problems in every country, public transportation going to make a biggest impact in this problem next to Affordable housing. So let’s take a look on data driven system that can provide some recommendation on how government can tackle this issues. 

As a common man, I have been trying different transportation routes in Kuala Lumpur and experiencing what common problem people face and not just KL, this is going to be a 

Each City has it’s own public Transportation problem but how do we solve it? 

Is Public Transportation a real problem to solve? 

There are lot of big problems which are impacted by the public transportation. I would go in detail about each of this things later but one major issue which is coming soon in 5-10years in all metropolitan cities is rural to urban movement of people which going to create new problems which government need to tackle. And I believe not every problem would be solved by governments but collaboratively with people,  business and other non profit organization as there is big force needed to tackle this issues. 

Affordable housing crisis(Issue1) which I wrote about few weeks ago is one of the problem which going to impact negatively people every day life. 

One simple question people wanted to suggest why not live out of the city when you cannot afford? Yes and No

IF YES, when their home is far from their work location, people could still opt for it if there is public transportation available. A simple challenge from people to government should be if the people need to move farther location for housing then the time taken to reach the place should be minimised between work and home.

Does that mean everybody need to have their own car? Yes and No. 

IF YES, then there will be a traffic issue when everybody start to drive. If everybody start to drive, then it also environmental issue. (Issue2)

Then import of automobile(cars, other transport vehicles) could affect the balance of import and export affecting the currency value (Issue3) .

Of course not a big problem if the country has a flourishing domestic automobile industry. Similar like the budget management, if an individual wallet share goes to buying a car, spending on a taxi, fuel, parking, toll and others, he is left out with less budget on spending money for education, proper food and lot of other things.

So if we want to encourage people to use public transportation, We and governments need to make sure there is 

  • Proper public transport from home to work 
  • The transportation is affordable 
  • It is efficient and time saving 
  • Less effort 🙂 
  • Also friendly and full filling. 

After travelling around the world and experiencing different public transportation, am currently living and experiencing in Kuala Lumpur, every day I force myself commute in public transport. All those exposure I had on the travel time giving a hope that every country should have proper infrastructure to tackle the urban movement crisis. Predicting as a crisis should be used an opportunity to solve. 

So how we do solve this problem with current data-driven technologies? 

In public transport, Every morning and evening I take the public transport, it takes almost 1-2hours per one way easily for a 14KM movement. Whereas if I take this ride on a car, it takes less than 30minutes if the traffic is not that worse. That’s just an example. Here is the difference

Table1 goes here

I know just my journey is not something very important or good sampling rate to decide. So I have been trying to take different routes in different places, not a very smooth experience. But we need to prove this via data for bigger pool of users. So we are working on a finding the time for every people. 

Two points – Source (Home) <> Destination (Work)  (SCORE1)

We will have datasets which comprise home GPS point(source) and work GPS point(destination) as this will help to calculate the destination of each route variation. Here what the algorithm does is automatically creating a multiple route variance creation by itself and rank us the route with Good, Bad and Needs improvements label. 

How does the system going to benchmark the routes? 

Initially we going to set a route successful factor,

  • If a route time consumption is more than 100% of the success factor, it would show FAIL. 
  • If a route time consumption is more than 50% of the success factor, it would show NEEDS IMPROVEMENT. 
  • If a route time consumption is less  than 50% the success factor, it would show SUCCESS.


SUPPLY AND DEMAND by population (SCORE 2) 

Then the algorithm would include another data of the population by every 1KM diameter (equals to 1Million Square meter in Land) what is the population. Because this would be useful to understand the demand from a particular point.

If the population from a particular source point is higher than #### specified number that it need more bus than the normal. If the number of bus originating from particular point is higher than the normal, it needs a different approach which will cover in the MRT/LRT system recommendation system. 

SUPPLY AND DEMAND by time variance (SCORE 3) 

Based on the supply and demand of each of this routes by time, a recommendation system provides at any given time how many bus or vehicles are needed in those route.


MRT/LRT System Recommendation system

In 1-2years once you start to see the demand of a bus has reached it’s capacity, then thats’ where the time we opt for LRT Trains or other way of transportation.

Experience Tracking

On additional to the automated data systems we collect, we will get people feedbacks which I define it as a listening system where they will be able to provide informations such as 

  • Condition of the whole experience 
  • Hospitality of the bus driver by Driver Name
  • Cleanliness of the bus by Bus number
  • Condition of the bus stops by bus stop name or stop number

Data Flow Architecture 

Layer 1 Government Data set of population | Layer 2 Government Data set of 

Infographic goes here