Learning R in Practice

Data IO

xtabs: cross table Data contains three variables X,Y,Z. To tabulate the result of Z with X and Y catalogs.

xtabs(Z~X+Y)

From Hmisc library splits a data frame into subgroups. It returns specific number of groups as factors.

groups <- cut2(data, g=number_of_groups)

dpylr library provides a series functions to simulate the data frame as a database.

df <- tbl_df(origin_df)

select(df, var2:var4)
select(df,-(var4:var2)) #delete

filter(df, var1==1)
filter(df,a<="3"|b=="IN") #or
filter(df,!is.na(var1))   #is not missing

arrange(df, var1)
arrange(df,desc(var1))

mutate(df, new_var1=var1+var2, new_var2=new_var1^2)

summarize(df, avg_var=mean(var))
summarize(df,sd_var=sd(var))

The group_by will initiate the groups in data frame.

by_var <- group_by(data, var)

summarize(by_var, avg_var2= mean(var2))
arrange(data, desc(var))

Also we could use piping to organize data flow %>5.

tidyr is a library to clean dataset.