Job Hunting Like A Data Analyst (part III) – A Simple Recommender

Mission

Continued with previous post – Explore the Job Market, this week I am going to develop a simple recommender system to find a suitable job .

Recommender

Let’s talk some background of recommendation system. A typical example of recommendation could be product recommended in the sidebar at Amazon or people you may know in Facebook.
Usually we can categorised recommender into two types:

1. Content Based Recommendation:

Content-based could mean user-based or product-based and the choice is depended on the mission of the business and information sparsity. A set of attributes are required to characterise the product or user, in order to have a customised and accurate prediction.
The problem is that sometimes there are not good way to automatically attribute new product or user.

2. Collaborative Filtering:

As the name implies, collaborative filtering will make use of both user and product information to make the recommendation.

Continue reading “Job Hunting Like A Data Analyst (part III) – A Simple Recommender” →

Job Hunting Like A Data Analyst (Part II)

Content

Continued with previous post, I’ve added some additional lines of codes to fetch the job description of each job post. This will take a bit longer time, which is about (1.5 hour) for me, because I set a delay of ~10 seconds between each request.
This week I will continue with overview picture of the job market of Data Analyst and develop a simple recommender based on skill and experience requirement.

Tools

python 2.x
python package: pandas
python package: re

1. Job Market Overview

Data Preparation:

After some time of web scraping, we will have a quite clean dataset, which consists of a big chunk of text job description. What I got is something like below:

Continue reading “Job Hunting Like A Data Analyst (Part II)” →

kaggle – Titanic

This is the first time I blog my journey of learning data science, which starts from the first kaggle competition I attempted – the Titanic. In this competition, we are asked to predict the survival of passengers onboard, with some information given, such as age, gender, ticket fare…

Translated letter reveals first hand account of the “unforgettable scenes where horror mixed with sublime heroism” as the Titanic sank Photo: Getty Images

How bad is this tragedy?

Let’s take some exploratory data analysis to look at the big picture of the tragedy. Looking at the bar charts, we can find that there is a rough estimation of 38% survival rate for the Titanic passengers. Much bigger percentage of male passengers were perished than females. The proportion of the First, Second and Third class are about 25%, 25% and 50%. The third class passengers are much more vulnerable. The Titanic has a variety of decks range from top deck (T) to bottom deck (G). The passengers on different decks seems indifferent in comparing the survival likelihood.

Some other facts

The majority of passengers on-board are adults, whose age range from 20-40. There a spike in survived portion for children and another spike in perished portion of young people from 20-30 years old. Ticket fares are mainly distributed below 50£ and the passengers with cheaper ticket fare are prone to die.

Continue reading “kaggle – Titanic” →

Chaoran's Data Story

data science self-learning repository

Category: exploratory analysis