데이터마이너를 꿈꾸며

Lesson 3: Explore One Variable

Python, R 분석과 프로그래밍/Data Analysis with R 2017. 7. 12. 10:23

#데이터 불러오기

getwd()[1] "C:/Users/amore/Documents/FOR ME/Data Scientist/Udacity/Data Analysis with R/eda-course-materials/lesson3"
list.files()
[1] "lesson3.Rmd"         "lesson3_student.rmd" "pseudo_facebook.tsv"
pf <- read.csv('pseudo_facebook.tsv', sep='\t')
names(pf)
 [1] "userid"                "age"                   "dob_day"              
 [4] "dob_year"              "dob_month"             "gender"               
 [7] "tenure"                "friend_count"          "friendships_initiated"
[10] "likes"                 "likes_received"        "mobile_likes"         
[13] "mobile_likes_received" "www_likes"             "www_likes_received"

#Histogram of User's Birthdays

library(ggplot2)qplot(x=dob_day, data = pf)

Here's some things that I noticed. On the first day of the month I see this huge bin of almost 8,000 people. This seems really unusual since I would expect most people to have the same number of birthday's across every day of the month.

library(ggplot2)qplot(x=dob_day, data = pf) + 
  scale_x_continuous(breaks=1:31)

#Faceting

library(ggplot2)qplot(x=dob_day, data = pf) + 
  scale_x_continuous(breaks=1:31)+
  facet_wrap(~dob_month, ncol=4)

Now, you may have noticed some peaks in May or perhaps in October, but I think what's really interesting is this huge spike on January first. There's almost 4,000 users in this bin. Now, this could be because of the default settings that Facebook uses or perhaps users are choosing the first choice in the drop down menus. Another idea is that some users may want to protect their privacy and so they just go with January first by default. Whatever the case may be, I think it's important that we make our considerations in the context of our data. We want to look out for these types of anomalies.

#Friend Count

ggplot(aes(x = friend_count),data = pf) +
   geom_histogram()

#Limiting the Axes

qplot(x = friend_count, data = pf, xlim = c(0,1000))


## alternative solution
qplot(x = friend_count, data = pf) + 
  scale_x_continuous(limits = c(0,1000))

#adjust the bin width

 qplot(x = friend_count, data = pf) + 
   scale_x_continuous(limits = c(0,1000), breaks = seq(0,1000,50))

## alternative solution
qplot(x = friend_count, data = pf) + 
  scale_x_continuous(limits = c(0,1000))

##splits up the data by gender

qplot(x = friend_count, data = pf) + 
  scale_x_continuous(limits = c(0,1000), breaks = seq(0,1000,50)) + 
  facet_grid(gender~.)




#Omitting NA Obervations
qplot(x = friend_count, data = subset(pf, !is.na(gender)), 
      binwidth = 10) + 
  scale_x_continuous(limits = c(0,1000), breaks = seq(0,1000,50)) + 
  facet_grid(gender~.)

#equivalent ggplot syntax:

ggplot(aes(x = friend_count), data = pf) +
  geom_histogram() +
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
  facet_wrap(~gender)




ggplot(aes(x = friend_count), data = subset(pf, !is.na(gender))) +
  geom_histogram() +
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
  facet_wrap(~gender)

##statistics by gender

> table(pf$gender)female   male 
 40254  58574 
> by(pf$friend_count, pf$gender, summary)
pf$gender: female
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0      37      96     242     244    4923 
--------------------------------------------------------------------- 
pf$gender: male
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0      27      74     165     182    4917

### Tenure

qplot(x=tenure, data = pf, binwidth = 30,
      color = I('black'), fill = I('#099DD9'))



#Equivalent ggplot syntax:
ggplot(aes(x = tenure), data = pf) +
  geom_histogram(binwidth = 30, color = 'black', fill = '#099DD9')

## How would you create a histogram of tenure by year?

#create a histogram of tenure measured in years rather than in days





#It looks like the bulk of our users had less than two and a half years on Facebook.qplot(x = tenure / 365, data = pf, binwidth = 0.25, 
      color = I('black'), fill = I('#099DD9')) +
  scale_x_continuous(breaks = c(1,7,1), limits = c(0,7))
#Equivalent ggplot syntax:
ggplot(aes(x = tenure/365), data = pf) +
  geom_histogram(binwidth = .25, color = 'black', fill = '#F79420')

## Labeling Plots

qplot(x = tenure / 365, data = pf, binwidth = 0.25, 
      xlab = 'Number of years using Facebook',      ylab = 'Number of users in sample',
      color = I('black'), fill = I('#099DD9')) +
  scale_x_continuous(breaks = c(1,7,1), limits = c(0,7))


#Equivalent ggplot syntax:

ggplot(aes(x = tenure / 365), data = pf) +
  geom_histogram(color = 'black', fill = '#F79420') +
  scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) +
  xlab('Number of years using Facebook') +
  ylab('Number of users in sample')

## User Ages

> range(pf$age)
[1]  13 113
qplot(x = age, data = pf,binwidth = 5,
      xlab = 'User ages',
      ylab = 'Number of users in sample',
      color = I('black'), fill = I('blue')) +
  scale_x_continuous(breaks = c(0,113,5),limits= c(13,113))


#Equivalent ggplot syntax:

ggplot(aes(x = age), data = pf) +
  geom_histogram(binwidth = 1, fill = '#5760AB') +
  scale_x_continuous(breaks = seq(0, 113, 5))

## Transforming Data

> summary(pf$friend_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0    31.0    82.0   196.4   206.0  4923.0 
> summary(log10(pf$friend_count))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   -Inf   1.491   1.914    -Inf   2.314   3.692 
> summary(log10(pf$friend_count + 1))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.505   1.919   1.868   2.316   3.692 
> summary(sqrt(pf$friend_count))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   5.568   9.055  11.090  14.350  70.160 
#Learn about how to use scales and how to create multiple plots on one page.  You'll also need to install and load the package gridExtra.
#Create 1 column thwi the folling thres histograms, friend count / transformed using  log 10 and square root.
> library(gridExtra)
> p1 <- qplot(x = friend_count, data = pf)
> p2 <- qplot(x = log10(friend_count+1), data = pf)
> p3 <- qplot(x = sqrt(friend_count), data = pf)
> grid.arrange(p1,p2,p3, ncol = 1)
#Equivalent ggplot syntax: we get the same outputs as we had before
pp1 <- ggplot(aes(x=friend_count), data = pf) + geom_histogram()
pp2 <- pp1 + scale_x_log10()
pp3 <- pp1 + scale_x_sqrt()

grid.arrange(pp1,pp2,pp3, ncol = 1)

## Add a Scaling Layer

> logScale <- qplot(x = log10(friend_count), data = pf)
> countScale <- ggplot(aes(x = friend_count), data = pf) +
  geom_histogram() +
  scale_x_log10()
> grid.arrange(logScale, countScale, ncol=2)

-> At the two plots, we can see that the difference is really in the x axis labeling. 
   Using scale_x_log10 will label the axis in actual friend_count. Where as using the 
   log10 wrapper will label the x axis in the log units. In general it is easier to 
   think about actual counts, so that's why people prefer using the scale_x_log10 as 
   a layer.

qplot(x=friend_count, data = pf) +
    scale_x_log10()

## Frequency Polygons : who has more friends on average men or women?

#This allows us to see the shape and the peaks of our distribution in more detail.qplot(x = friend_count, data = subset(pf, !is.na(gender)),
      binwidth = 10) +
  scale_x_continuous(lim = c(0,1000), breaks = seq(0,1000,50)) +
  facet_wrap(~gender)


qplot(x = friend_count, data = subset(pf, !is.na(gender)),
      binwidth = 10, geom='freqpoly', color = gender) +
  scale_x_continuous(lim = c(0,1000), breaks = seq(0,1000,50)) 



#But again, this plot doesn't really answer our question who has more friends on 
average men or women. Let's change the y-axis to show proportions instead of raw 
counts. 


qplot(x = friend_count, y = ..count.. / sum(..count..),
      data = subset(pf, !is.na(gender)),
      xlab = 'Friend Count',
      ylab = 'Proportion of Users with that firned count',
      binwidth = 10, geom='freqpoly', color = gender) +
  scale_x_continuous(lim = c(0,1000), breaks = seq(0,1000,50))

## Likes on the Web

qplot(x=www_likes, data = subset(pf, !is.na(gender)),
      geom = 'freqpoly', color = gender) +
  scale_x_continuous()+
  scale_x_log10()



# what's the www_like count for males?The first question is asking how many www_likes 
  there are in the entire data set for males.

> by(pf$www_likes, pf$gender, sum)

pf$gender: female
[1] 3507665
--------------------------------------------------------------------- 
pf$gender: male

## Box Plots

qplot(x = gender, y = friend_count,      data = subset(pf, !is.na(gender)),
      geom='boxplot')

## Adjust the code to focus on users who have friend counts between 0 and 1000.

qplot(x = gender, y = friend_count,
      data = subset(pf, !is.na(gender)),
      geom='boxplot',ylim = c(0,1000))




#same way
qplot(x = gender, y = friend_count,
      data = subset(pf, !is.na(gender)),
      geom = 'boxplot') +
scale_y_continuous(limits=c(0,1000))




-> Notice how the top of the box is just below 250, so it might be around 230. But this value might not be accurate for all of our data. use the ylim parameter or the scale_y_continious layer, we actually remove data points from calculations. 

# So a better way to do this is tu use the cord Cartesian layer to set the y limits instead.qplot(x = gender, y = friend_count,
      data = subset(pf, !is.na(gender)),
      geom = 'boxplot') +
  coord_cartesian(ylim= c(0,1000))



-> Here we will set the y limts from 0 to a 1000, notice how the top of the box has moved slightly closer to 250 for females.

## Box Plots, Quartiles, and Friendships

qplot(x = gender, y = friend_count,
      data = subset(pf, !is.na(gender)),
      geom = 'boxplot') +
  coord_cartesian(ylim = c(0,250))


> by(pf$friend_count, pf$gender, summary)
pf$gender: female
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0      37      96     242     244    4923 
---------------------------------------------------------------------------- 
pf$gender: male
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0      27      74     165     182    4917 
-> The third quartile of the 75% mark is at 244 and that's all the way up 
here(그래프를 보며). This means that 75% of female users have friend count below 244.
Or another way to say this is that 25% of female user have more than 244 friends.

## On average, who initiated more friendships in our sample: men or women?

> names(pf)
 [1] "userid"                "age"                   "dob_day"              
 [4] "dob_year"              "dob_month"             "gender"               
 [7] "tenure"                "friend_count"          "friendships_initiated"
[10] "likes"                 "likes_received"        "mobile_likes"         
[13] "mobile_likes_received" "www_likes"             "www_likes_received"   
qplot(x = gender, y = friendships_initiated,
      data = subset(pf, !is.na(gender)), geom = 'boxplot') +
  coord_cartesian(ylim = c(0,150))

> by(pf$friendships_initiated, pf$gender, summary)
pf$gender: female
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0    19.0    49.0   113.9   124.8  3654.0 
---------------------------------------------------------------------------- 
pf$gender: male
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0    15.0    44.0   103.1   111.0  4144.0 

-> On average, who initiated more friendships in our sample: men or women?
Women.

## Getting Logical : What percent of check in using mobile?

> head(pf$mobile_likes)
[1] 0 0 0 0 0 0
> summary(pf$mobile_likes)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     0.0     4.0   106.1    46.0 25110.0 
> summary(pf$mobile_likes > 0)
   Mode   FALSE    TRUE    NA's 
logical   35056   63947       0 
> mobile_check_in <- NA
> pf$mobile_check_in <- ifelse(pf$mobile_likes>0, 1, 0)
> pf$mobile_check_in <- factor(pf$mobile_check_in)
> summary(pf$mobile_check_in)
    0     1 
35056 63947 
> sum(pf$mobile_check_in == 1) / length(pf$mobile_check_in) 
[1] 0.6459097
#what percent of check in using mobile? 65%

#reference

1. https://cn.udacity.com/course/data-analysis-with-r--ud651

2. http://www.cookbook-r.com/Graphs/Facets_(ggplot2)

3. https://en.wikipedia.org/wiki/Web_colors

4. http://ggplot2.tidyverse.org/reference/theme.html

5. http://lightonphiri.org/blog/ggplot2-multiple-plots-in-one-graph-using-gridextra

6. http://ggplot2.tidyverse.org/reference/scale_continuous.html

7. https://en.wikipedia.org/wiki/Linear_regression#Assumptions

8. https://en.wikipedia.org/wiki/Normal_distribution

9. https://www.r-statistics.com/2013/05/log-transformations-for-skewed-and-wide-distributions-from-practical-data-science-with-r/

'Python, R 분석과 프로그래밍 > Data Analysis with R' 카테고리의 다른 글

Lesson2: R Basic (0)	2017.07.11

Posted by 마르띤

,

Lesson2: R Basic

Python, R 분석과 프로그래밍/Data Analysis with R 2017. 7. 11. 10:14

# 원하는 데이터만 발라서 보기

> data(mtcars)

> mean(mtcars$mpg)

[1] 20.09062

> subset(mtcars, mtcars$mpg >= 30 | mtcars$hp < 60)

mpg cyl disp hp drat wt qsec vs am gear carb

Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1

Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2

Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1

Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2

> mtcars[mtcars$mpg >= 30 | mtcars$hp < 60, ] #column 전체를 보기 위해서는 콤마 필수

mpg cyl disp hp drat wt qsec vs am gear carb

Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1

Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2

Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1

Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2

#reddit data 보기, 범주형 변수 그래프 그리기

> reddit <- read.csv('reddit.csv')

> library(ggplot2)

> qplot(data = reddit, x = age.range)

#오더 순서 정하기 setting levles of ordered factors solution

> reddit$age.range = ordered(reddit$age.range,levels=c("Under 18","18-24", "25-34","35-44","45-54","55-64","65 or Above"))

> qplot(data = reddit, x = age.range)

#alternative solution

> reddit$age.range = factor(reddit$age.range, levels=c("Under 18","18-24", "25-34","35-44","45-54","55-64","65 or Above"),ordered=T)

> qplot(data = reddit, x = age.range)

# practice

> nlevels(reddit$income.range)

[1] 8

> levels(reddit$income.range)

[1] "$100,000 - $149,999" "$150,000 or more" "$20,000 - $29,999" "$30,000 - $39,999"

[5] "$40,000 - $49,999" "$50,000 - $69,999" "$70,000 - $99,999" "Under $20,000"

> qplot(data = reddit, x = income.range)

#아래같은 방법도 가능

> reddit$income.range = ordered(reddit$income.range, levels=c("Under $20,000" , "$20,000 - $29,999" , "$30,000 - $39,999", "$40,000 - $49,999","$50,000 - $69,999" , "$70,000 - $99,999”, ”$100,000 - $149,999" , "$150,000 or more"))

> qplot(data = reddit, x = income.range)

#다른 예제

> tShirts <- factor(c('medium', 'small', 'large', 'medium', 'large', 'large'), levels = c('medium','small','large'))

> tShirts

[1] medium small large medium large large

Levels: medium small large

> qplot(x = tShirts)

> tShirts <- ordered(tShirts, levels = c('small', 'medium', 'large'))

> tShirts

[1] medium small large medium large large

Levels: small < medium < large

> qplot(x = tShirts)

참고

https://cn.udacity.com/course/data-analysis-with-r--ud651

https://cn.udacity.com/course/data-wrangling-with-mongodb--ud032

http://vita.had.co.nz/papers/tidy-data.pdf

http://courses.had.co.nz.s3-website-us-east-1.amazonaws.com/12-rice-bdsi/slides/07-tidy-data.pdf

http://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html

http://www.statmethods.net/index.html

https://www.r-bloggers.com/

http://www.cookbook-r.com/

http://blog.revolutionanalytics.com/2013/08/foodborne-chicago.html

http://blog.yhat.com/posts/roc-curves.html

https://github.com/corynissen/foodborne_classifier

'Python, R 분석과 프로그래밍 > Data Analysis with R' 카테고리의 다른 글

Lesson 3: Explore One Variable (0)	2017.07.12

Posted by 마르띤

,

3장 인자분석 - 주성분인자법(principal factor method) 사례

KNOU/3 다변량분석 2017. 3. 27. 19:59

### 5주차, 제3장 인자분석 ###

아래 자료는 무료 검진프로그램인 PHI를 개발하여 프로그램 유효서을 모니터링 한 자료이다. 11개 검진 항목, 128명의 자료가 이으며 R을 이용하여 주성분인자법(principal factor method)를 이용한 인자분석을 해 보자.

1. 자료 가져오기 및 요약 통계

> setwd('C:/Rwork')

> med.data = read.table('medFactor.txt',header=T)

> head(med.data)

lung muscle liver skeleton kidneys heart step stamina stretch blow urine

1 20 16 52 10 24 23 19 20 23 29 67

2 24 16 52 7 27 16 16 15 31 33 59

3 19 21 57 18 22 23 16 19 42 40 61

4 24 21 62 12 31 25 17 17 36 36 77

5 29 18 62 14 26 27 15 20 33 29 88

6 18 19 51 15 29 23 19 20 50 37 54

> boxplot(med.data)

2. 초기 인자분석 실행하기

> library(psych)

> library(GPArotation) #인자회전

> med.factor = principal(med.data,rotate='none') #rotate에 ‘none’, ‘varimax’,’quanrtmax’, ‘oblimin’ 등 방법을 사용

> names(med.factor)

[1] "values" "rotation" "n.obs" "communality" "loadings" "fit"

[7] "fit.off" "fn" "Call" "uniquenesses" "complexity" "chi"

[13] "EPVAL" "R2" "objective" "residual" "rms" "factors"

[19] "dof" "null.dof" "null.model" "criteria" "STATISTIC" "PVAL"

[25] "weights" "r.scores" "Structure" "scores"

> med.factor$values #고유근

[1] 3.3791814 1.4827707 1.2506302 0.9804771 0.7688022 0.7330511 0.6403994 0.6221934 0.5283718 0.3519301

[11] 0.2621928

> plot(med.factor$values,type='b')

>> med.factor$values #고유근을 통해 세 번째 인자까지 고유근이 1 이상인 것을 알 수 있다. 스크리 그림에서는 4번째 인자다음부터 그래프의 기울기가 완만해지는 것을 볼 수 있다. 따라서 유효한 인자의 수는 3-4개로 생각할 수 있다.

3. 인자 회전

(1) 직교회전(orthogonal rotation)의 Varimax

> med.varimax = principal(med.data,nfactors=3,rotate='varimax')

> med.varimax

>> 공통성은 많은 문항(변수) 중에서 서로연관이 있는 일부 문항들을 선택하기 위하여 인자분석을 실시하는 경우 문항선택의 기준으로 이용된다. 예를 들면, 문항 100개를 이용하여 인자분석을 한 결과 공통성이 0.3 이하인 것이 40개라면 이 40개 문항은 다른 문항들과 공통점이 별로 없는 것으로 판단할 수 있다.

>> 고유분산은 u2 = 1 – h2(공통성) 공식으로 바로 구할 수 있다.lung의 경우 0.53 = 1 – 0.47

>> ss loading은 각 인자에 의해 설명되는 분산의 양을 나타낸다.

RC1 2.39 = 0.66² + 0.11² + 0.78² + … + 0.25² + (-0.07)²

>> Proportion Var는 각 인자가 설명하는 총 분산의 비율을 의미한다. RC1은 총 분산의 22%, RC2가 19%, RC3는 14%, 아래 Cumulative var는 누적값. 세 인자에 의해 설명되어지는 변동은 총 변동의 56%.

>> 회전된 요인에 대한 변수들의 요인 적재값을 보면 첫 번째 인자는 lung, liver, kidneys, heart가 높은 값을 가지며, step은 상대적으로 높은 적재값을 가진다. (적재값이 뭘 의미하는지 잘 모르겠다 ㅠ)

두 번째 인자는 stamina, stretch, blow, urine의 값이 높다. 마지막으로 세번째 인자는 muscle과 skeleton에서 높은 값을 가진다. 따라서 첫 번째 인자는 생물의학(biomedical), 두번째 인자는 인체기능(performance), 세번째 인자는 근육골계통력(Muscular – skeletal strength)으로 특정 지을 수 있다.

>> 인자분석은 주성분분석과 달리 인자들의 결합으로 원변수들을 표현한다.

Lung = 0.66 x factor1 + 0.12 x facotr2 + 0.16 x factor3 + error

인자점수 추정방법은 회귀분석, Barlett 등이 있는데, 아래는 회귀분석을 이용한 인자점수이다.

> head(med.varimax$scores)

RC1 RC2 RC3

[1,] -0.11970907 -0.2574867 -0.74473196

[2,] 0.05696634 -0.4784611 -1.53199247

[3,] -0.59153602 0.8957933 1.52598533

[4,] 1.20919164 0.3179760 -0.42022213

[5,] 0.82291043 0.1513448 -0.02988744

[6,] -0.08606120 1.1451913 0.76290104

인자분석의 시각화 몇가지 사례

> plot(med.varimax)

> plot(med.varimax$loadings[,c(1:2)])

> text(med.varimax$loadings[,c(1:2)],colnames(med.data))

> biplot(med.varimax$scores,med.varimax$loadings)

> fa.diagram(med.varimax)

> omega(med.data)

3. 인자 회전

(2) 사곽회전(Oblique rotation) - Oblimin

> med.oblimin = principal(med.data,nfactors=3,rotate='oblimin',scores=T,method='regression')

> med.oblimin

>> Oblimin을 이용한 인자분석에 대한 결과는 Varimax와 마찬가지로 세 인자에 대해 묶여지는 변수는 같으며 다만 인자 적재값이 차이가 나는 것을 볼 수 있다. Oblimin 방법을 이용한 결과에 대한 각 변수의 인자 모형은 아래와 같다.

Lung = 0.66 x factor1 + 0.01 x factor2 + 0.1 x facotr3

>> 회귀분석을 이용한 인자점수 함수식은 아래와 같다

Factor1 = 0.66 x lung + 0.02 x muscle + …+ (-0.09) x urine

인자점수는 아래와 같다.

> head(med.oblimin$scores)

TC1 TC2 TC3

[1,] -0.2325231 -0.2639103 -0.77351112

[2,] -0.1722964 -0.4641659 -1.54539480

[3,] -0.2852637 0.8308965 1.50251465

[4,] 1.1976234 0.4291411 -0.22225725

[5,] 0.8296036 0.2260879 0.09586509

[6,] 0.1766403 1.1289891 0.83993963

4. 행렬도

> biplot(med.varimax)

참고문헌:

[1] 다변량분석(김성수, 김현중, 정성석, 이용구 공저)

[2] R 프로그램에 기반한 다변량분석 및 데이터마이닝(이재길)

'KNOU > 3 다변량분석' 카테고리의 다른 글

2장 주성분분석 - heptathlon data / beer data (0)	2017.03.27

Posted by 마르띤

,

데이터마이너를 꿈꾸며

Lesson 3: Explore One Variable

'Python, R 분석과 프로그래밍 > Data Analysis with R' 카테고리의 다른 글

Lesson2: R Basic

'Python, R 분석과 프로그래밍 > Data Analysis with R' 카테고리의 다른 글

3장 인자분석 - 주성분인자법(principal factor method) 사례

'KNOU > 3 다변량분석' 카테고리의 다른 글

링크

카테고리

최근에 올라온 글

최근에 받은 트랙백

글 보관함

티스토리툴바