Small Steps Every Day

매일 천천히 조금씩 앞으로 나아가다.

통계・데이터과학/R 컴퓨팅

R 패키지 설치, R 스튜디오 활용, 기타 고급기능

mindata1 2025. 3. 2. 09:00
R 패키지

 

R 패키지

  • R을 사용하는 가장 큰 이유 중 하나는 전 세계 사용자들이 구축해 놓은 다양하고 방대한 패키지를 사용할 수 있다는 것
  • 패키지 : 특정 분석을 수행할 수 있는 함수, 객체, 도움말, 데이터 등의 집합
  • 처음 설치 시 기본적으로 설치되는 stats 같은 패키지에는 기초적인 데이터 분석이 가능한 기본 통계함수들이 포함
  • 기본 패키지는 library( ) 함수로 조회 가능
  • search( ) : 기본패키지 조회

 

R 기본 패키지

> library()

Packages in library ‘/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library’:

base             The R Base Package
boot             Bootstrap Functions (Originally
                 by Angelo Canty for S)
class            Functions for Classification
cluster          "Finding Groups in Data": Cluster
                 Analysis Extended Rousseeuw et
                 al.
codetools        Code Analysis Tools for R
compiler         The R Compiler Package
datasets         The R Datasets Package
foreign          Read Data Stored by 'Minitab',
                 'S', 'SAS', 'SPSS', 'Stata',
                 'Systat', 'Weka', 'dBase', ...
graphics         The R Graphics Package
grDevices        The R Graphics Devices and
                 Support for Colours and Fonts
grid             The Grid Graphics Package
KernSmooth       Functions for Kernel Smoothing
                 Supporting Wand & Jones (1995)
lattice          Trellis Graphics for R
MASS             Support Functions and Datasets
                 for Venables and Ripley's MASS
Matrix           Sparse and Dense Matrix Classes
                 and Methods
methods          Formal Methods and Classes
mgcv             Mixed GAM Computation Vehicle
                 with Automatic Smoothness
                 Estimation
misc3d           Miscellaneous 3D Plots
nlme             Linear and Nonlinear Mixed
                 Effects Models
nnet             Feed-Forward Neural Networks and
                 Multinomial Log-Linear Models
parallel         Support for Parallel Computation
                 in R
plot3D           Plotting Multi-Dimensional Data
rpart            Recursive Partitioning and
                 Regression Trees
spatial          Functions for Kriging and Point
                 Pattern Analysis
splines          Regression Spline Functions and
                 Classes
stats            The R Stats Package
stats4           Statistical Functions using S4
                 Classes
survival         Survival Analysis
tcltk            Tcl/Tk Interface
tools            Tools for Package Development
utils            The R Utils Package
> search()
 [1] ".GlobalEnv"        "tools:RGUI"       
 [3] "package:stats"     "package:graphics" 
 [5] "package:grDevices" "package:utils"    
 [7] "package:datasets"  "package:methods"  
 [9] "Autoloads"         "package:base"

 

 

R 패키지 설치

  • install.packages("패키지명")
  • package & data → package installer

 

 

R 패키지 활성화

  • library(패키지명)
  • 설치 후 꼭 활성화해야 정상 작동 가능

예를 들어, 데이터마이닝 기법 중 하나인 나무모형은 'rpart' 라는 패키지를 설치하여 활성화 할 수 있다.

> help(package="rpart") #웹 기반 패키지 설명서

 

 

> library(help="rpart") #텍스트 기반 패키지 설명서

		Information on package ‘rpart’

Description:

Package:            rpart
Priority:           recommended
Version:            4.1.24
Date:               2025-01-06
Authors@R:          c(person("Terry", "Therneau", role = "aut",
                    email = "therneau@mayo.edu"), person("Beth",
                    "Atkinson", role = c("aut", "cre"), email =
                    "atkinson@mayo.edu"), person("Brian", "Ripley",
                    role = "trl", email = "ripley@stats.ox.ac.uk",
                    comment = "producer of the initial R port,
                    maintainer 1999-2017"))
Description:        Recursive partitioning for classification,
                    regression and survival trees.  An
                    implementation of most of the functionality of
                    the 1984 book by Breiman, Friedman, Olshen and
                    Stone.
Title:              Recursive Partitioning and Regression Trees
Depends:            R (>= 2.15.0), graphics, stats, grDevices
Suggests:           survival
License:            GPL-2 | GPL-3
LazyData:           yes
ByteCompile:        yes
NeedsCompilation:   yes
Author:             Terry Therneau [aut], Beth Atkinson [aut, cre],
                    Brian Ripley [trl] (producer of the initial R
                    port, maintainer 1999-2017)
Maintainer:         Beth Atkinson <atkinson@mayo.edu>
Repository:         CRAN
URL:                https://github.com/bethatkinson/rpart,
                    https://cran.r-project.org/package=rpart
BugReports:         https://github.com/bethatkinson/rpart/issues
Packaged:           2025-01-06 13:26:22 UTC; ripley
Date/Publication:   2025-01-07 07:30:14 UTC
Built:              R 4.4.1; aarch64-apple-darwin20; 2025-01-25
                    18:32:09 UTC; unix
Archs:              rpart.so.dSYM

Index:

car.test.frame          Automobile Data from 'Consumer Reports' 1990
car90                   Automobile Data from 'Consumer Reports' 1990
cu.summary              Automobile Data from 'Consumer Reports' 1990
kyphosis                Data on Children who have had Corrective Spinal
                        Surgery
labels.rpart            Create Split Labels For an Rpart Object
meanvar.rpart           Mean-Variance Plot for an Rpart Object
na.rpart                Handles Missing Values in an Rpart Object
path.rpart              Follow Paths to Selected Nodes of an Rpart
                        Object
plot.rpart              Plot an Rpart Object
plotcp                  Plot a Complexity Parameter Table for an Rpart
                        Fit
post.rpart              PostScript Presentation Plot of an Rpart Object
predict.rpart           Predictions from a Fitted Rpart Object
print.rpart             Print an Rpart Object
printcp                 Displays CP table for Fitted Rpart Object
prune.rpart             Cost-complexity Pruning of an Rpart Object
residuals.rpart         Residuals From a Fitted Rpart Object
rpart                   Recursive Partitioning and Regression Trees
rpart.control           Control for Rpart Fits
rpart.exp               Initialization function for exponential fitting
rpart.object            Recursive Partitioning and Regression Trees
                        Object
rsq.rpart               Plots the Approximate R-Square for the
                        Different Splits
snip.rpart              Snip Subtrees of an Rpart Object
solder.balance          Soldering of Components on Printed-Circuit
                        Boards
stagec                  Stage C Prostate Cancer
summary.rpart           Summarize a Fitted Rpart Object
text.rpart              Place Text on a Dendrogram Plot
xpred.rpart             Return Cross-Validated Predictions

Further information is available in the following vignettes in
directory
‘/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rpart/doc’:

longintro: Introduction to Rpart (source, pdf)
usercode: User Written Split Functions (source, pdf)

 

> data(package = "rpart") #패키지 내 데이터 목록 제공

Data sets in package ‘rpart’:

car.test.frame        Automobile Data from 'Consumer Reports' 1990
car90                 Automobile Data from 'Consumer Reports' 1990
cu.summary            Automobile Data from 'Consumer Reports' 1990
kyphosis              Data on Children who have had Corrective
                      Spinal Surgery
solder                Soldering of Components on Printed-Circuit
                      Boards
solder.balance (solder)
                      Soldering of Components on Printed-Circuit
                      Boards
stagec                Stage C Prostate Cancer

 

 

R 스튜디오

 

  • R의 활용을 돕는 효과적인 IDE(Integrated Development Environment: 통합개발환경) 소프트웨어
    • IDE: 편집기, 컴파일러, 프로그램 디버깅, GUI 등 여러 애플리케이션 패키지를 묶어 한 프로그램 안에 구현한 프로그래밍 환경
  • 데이터 관리, 문서 및 프레젠테이션 자료 편집, HTML 작업 등의 기능 활용 가능
    • 예) R 마크다운: 결과 보고서 작성하고, 필요에 따라 R의 계산 결과 추가 가능

 

R 스튜디오 설치

 

  • "RStudio Desktop" 무료 버전으로 설치

https://posit.co/download/rstudio-desktop/

 

Posit

The best data science is open source. Posit is committed to creating incredible open-source tools for individuals, teams, and enterprises.

posit.co

 

 

기타 고급 기능

 

R의 함수나 패키지 등을 활용하기 위해 내장된 도움말을 읽는 것이 좋지만, 부족하다고 느껴질 때 사용하기 좋은 방법이다.

  • 구글링 혹은 스택오버플로(www.stackoverflow.com) 등을 통해 검색
  • DataCamp.com 에서 제공하는 Quick-R(www.statmethods.net) : R과 통계,그래프 그리기 등에 대한 튜토리얼 제공
  • Web-R(www.web-r.org), KOCW(www.kocw.net) : R 에 대한 정보를 한글로 소개 받을 수 있음
  • RStudio
    • R 스튜디오 온라인 러닝 : 다양한 사용법 익힐 수 있음
    • R 저널 : 최신 패키지에 대한 소개 받을 수 있음

 

책 추천

  • R for Data Science(Garrett Grolemund & Hadley Wickham) : http://r4ds.had.co.nz
    • 데이터 전처리, 핸들링, 그래프 그리기를 위한 다양한 패키지 사용법 소개
  • Advanced R (Hadley Wickham) : http://adv-r.had.co.nz
    • 프로그래밍 언어로서 R을 좀 더 깊게 활용할 수 있는 다양한 팀 소개