Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Thursday, July 25, 2019

Problem solving using R and Python a caution



Recently, I had a task of identifying repeated customers. For instance customer A purchases a product on 17th May and later comes on 22nd May , 24th May and so on. Likewise customer B purchases a product on 18th May and purchases similar products in  later dates. I had  a data of  3000+ customers and the repeated visit list of 21000. I need to identify  day wise  list of repetition something like this. The expected outcome is “ on which date of purchase yielded Maximum repeated customer’. Later we could identify the revenue.
Date of
 First purchase/Repeat
18 May
19 May
20 May
17 May
3
4
7
18 May
12
6
5
19 May
10
4
2

I used python for reading both the lists , coded  and computed the daywise list.  Out of curiosity , used merge command and tabulated daywise using R to cross check. To my shock there was a difference of 25 %. It spoiled my entire enthusiasm of doing  something useful as there is a substantial difference which otherwise should not be.  Compared the  tabulated values daywise obtained from python and the values obtained from R.
This simple piece of work has cost me few hours , but the learning is forever, which I thought of sharing .
Findings:   There is a marginal difference in every row. This is due to the fact that the machine/ data source has  duplicated the data while capturing the repeated customers. Cross checking is always required , may be with a different tool / strategy / approach. This might have resulted in wrong computations and projections, but I have avoided them in full.

Sunday, March 11, 2018

Analysing the employment data from Indian railways

Source data from GOI
Credits: Ramprasath

Here is an interesting data to analyse.

How was the employment generated by Indian railways in the past 15 years ? Here is the insights.
The code was developed in R
Data preparation:
The rows and column are transposed to make it easy and readable.

setwd('d:/suman')
job<-read.csv('railwayjob.csv')
typeof(job)

for (j in 2:15)
{
  maxval<-max(job[2:19,j],na.rm = TRUE)
  minval<-min(job[2:19,j],na.rm = TRUE)
  y<-which(maxval==job[,j])
  z<-which(minval==job[,j])
  ans<-job[y,1]
  ans2<-job[z,1]
  print(paste("sl = ",j,", min = ",ans2))
  print(paste("sl = ",j,", max = ",ans))

}

This is a simple data set  for a period of  15 years  and  20 rows.Each indicate the railway zone.

The results are interesting:
Northern zone are contributing more for the employment generation.
The least were the kolkata.  More visualization can be added to this.


Wednesday, March 7, 2018

An algorithm for assignment problem in R

A small algorithm was written in R which will identify the optimum assignment in a ship.
The problem was tested for assigning loads in 3 trucks in a ship. It could be expanded to n ships and their combination. The algorithm was tested with some combination of values. You may pass on your comments and views for the improvement .

Assignment problem link