R
# Operasi dasar
2 + 3 # Penjumlahan[1] 5
7 - 5 # Pengurangan[1] 2
3 * 5 # Perkalian[1] 15
3 / 4 # Pembagian[1] 0.75
2^3 # Pangkat[1] 8
2:4 # Sequence[1] 2 3 4
Rangkuman Materi R
Tim Asisten Lab Matematika UI
November 10, 2025
Kembali ke Arsip Praktikum PSD 2025
Setelah menyelesaikan modul ini, mahasiswa diharapkan mampu:
Selamat! Anda telah menyelesaikan hampir seluruh perjalanan pembelajaran di Pengantar Sains Data. Kini saatnya untuk mengkonsolidasikan semua ilmu yang telah Anda pelajari.
Bayangkan Anda adalah seorang Data Analyst di perusahaan konsultan yang diminta untuk melakukan analisis komprehensif terhadap berbagai dataset klien. Untuk itu, Anda perlu menguasai:
Modul ini dirancang sebagai review komprehensif yang akan membantu Anda mengingat kembali dan mengintegrasikan semua keterampilan tersebut.
R adalah bahasa pemrograman yang powerful untuk analisis data. Mari kita mulai dengan operasi dasar:
Vector adalah kumpulan data dengan tipe yang sama, merupakan struktur data fundamental di R.
[1] 85 92 78 88 95
[1] "apel" "jeruk" "pisang"
[1] "apel"
[1] 92 78 88
Enam Tipe Dasar Vector
Ada enam tipe utama, namun empat di antaranya akan Anda gunakan 99% setiap saat.
Ini adalah tipe default untuk angka di R. Tipe ini bisa menampung angka desimal.
Jika Anda yakin hanya akan bekerja dengan bilangan bulat, Anda bisa membuatnya secara eksplisit dengan menambahkan L di belakang angka. Ini menghemat memori.
Untuk menyimpan data teks. Selalu diapit oleh tanda kutip " atau '.
Sampling sangat penting untuk analisis statistik:
Package memperluas fungsionalitas R. Untuk import data, kita memerlukan package khusus.
Import Data dari Berbagai Format:
Mari kita gunakan dataset iris yang sudah tersedia di R:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Informasi Struktur Data:
[1] 150 5
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Ringkasan Statistik:
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
Statistik Deskriptif Spesifik:
Subsetting/Filter Data:
R
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Visualisasi membantu kita memahami distribusi dan pola dalam data.
Membuat Tabel Frekuensi:
R
Species Freq
1 setosa 50
2 versicolor 50
3 virginica 50
ggplot2 adalah package yang sangat powerful untuk visualisasi di R.
Untuk setiap distribusi di R, tersedia 4 fungsi utama:
d...(): Density function (PMF untuk diskrit, PDF untuk kontinu)p...(): Probability function (CDF - Cumulative Distribution Function)q...(): Quantile function (inverse CDF)r...(): Random number generator (RNG)[1] 0.3
[1] 0.7
[1] 0.3
[1] 1
[1] 0
[1] 0 1 0 0 1 0 0 1 1 0

[1] 0.001446701
[1] 0.001590386
[1] 5
# Random generation
random_binom <- rbinom(100, size = n, prob = p)
# Visualisasi PMF
x <- 0:n
plot(x, dbinom(x, size = n, prob = p),
type = "h", lwd = 5, col = "darkgreen",
main = "PMF Binomial(10, 0.7)",
xlab = "x", ylab = "P(X = x)")
points(x, dbinom(x, size = n, prob = p), pch = 19, cex = 2, col = "darkgreen")

[1] 0.2240418
[1] 0.4231901
[1] 3
[1] 5 7 4 3 4 1 2 3 5 4 1 0 4 1 4 4 2 3 5 5 5 1 7 5 7 2 1 1 2 3 2 2 6 3 1 7 6
[38] 3 3 2 7 4 1 7 3 6 1 0 3 1 2 2 1 2 3 5 3 1 1 3 3 5 5 2 3 4 1 2 3 3 8 5 3 1
[75] 4 3 4 3 2 4 2 2 3 3 5 3 4 1 5 4 3 2 2 3 3 2 2 3 3 2

R

[1] 0.1666667
[1] 0.6666667
[1] 3
[1] 2 1 5 1 5 4 1 3 3 1
[1] 3.5
[1] 2.916667
R
P(45 < X < 62) = 0.5764
Dengan Z-score: 0.5764
# Visualisasi
x <- seq(20, 80, length = 200)
plot(x, dnorm(x, mean = mu, sd = sigma),
type = "l", lwd = 2, col = "darkred",
main = "PDF Normal N(50, 100)",
xlab = "x", ylab = "f(x)")
# Area P(45 < X < 62)
x_shade <- seq(45, 62, length = 100)
y_shade <- dnorm(x_shade, mean = mu, sd = sigma)
polygon(c(45, x_shade, 62), c(0, y_shade, 0), col = rgb(1, 0, 0, 0.3), border = NA)
abline(v = mu, col = "blue", lty = 2)
[1] 0.3989423
[1] 0.9750021
[1] 1.959964
[1] -0.980520682 -0.915321404 -0.242386558 2.029502355 0.783383026
[6] 2.397292595 0.836565411 2.496647504 -1.194228162 -0.214326910
[11] -0.644745295 -0.942863174 0.582820234 0.624942260 -0.064528109
[16] 0.538575959 0.454537066 0.587575703 0.513498628 -0.648273294
[21] -0.361535386 -0.208882508 -0.191647582 -1.515058032 1.202807428
[26] 2.462176101 -0.054692734 0.928465006 -1.849197876 -0.176551199
[31] -0.076880383 1.318193516 2.718757185 -1.593345393 2.132974962
[36] -0.535683237 -0.920330812 -0.930286111 0.280197305 0.917064954
[41] -2.346834487 -0.626612005 -0.223786056 2.561655198 0.246609717
[46] -0.084556006 1.088663428 0.062612465 0.165482326 1.282162313
[51] -0.386215486 0.384398038 -0.212153146 -0.398740606 0.979794148
[56] -0.457217645 1.398951416 -0.614087144 -0.511371504 -1.234483691
[61] -0.379607236 0.190716622 0.424068282 -0.816954468 -0.235109421
[66] -0.555044910 -0.468561578 1.307533652 1.926788129 0.980300805
[71] 0.009675772 0.921451351 0.742992046 0.507275510 -0.619532895
[76] -0.384915337 0.164747569 0.633441643 -1.258560231 -0.887734437
[81] 1.064301904 -0.117037256 0.899000183 -1.691872858 -0.433265642
[86] 0.651411703 -0.183497059 -0.572777184 0.296999871 0.572448351
[91] 1.554619793 0.132949022 1.112457114 1.544183698 1.246944848
[96] -0.073408988 0.047217060 0.464199343 -1.421813362 -0.710557694

[1] 0.01176471
[1] 0.3529412
[1] 62.5


For Loop:
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 1 4 9 16 25
Fungsi replicate:
Fungsi replicate sangat berguna untuk simulasi statistik.
Simulasi Distribusi Sample Mean:
R
# Populasi
populasi <- c(1, 2, 3, 4, 5, 6, 7, 8)
# Simulasi 1000 kali, sample size 30
set.seed(123)
sample_means <- replicate(1000, {
sample_data <- sample(populasi, 30, replace = TRUE)
mean(sample_data)
})
# Visualisasi
hist(sample_means,
main = "Distribution of Sample Means (n=30)",
xlab = "Sample Mean",
col = "lightblue",
breaks = 30,
probability = TRUE)
# Overlay normal curve
curve(dnorm(x, mean = mean(sample_means), sd = sd(sample_means)),
add = TRUE, col = "red", lwd = 2)
Mean of sample means: 4.5013
SD of sample means: 0.4173
Perbandingan Berbagai Ukuran Sampel:
R
# Simulasi dengan berbagai ukuran sampel
par(mfrow = c(2, 2))
for (n in c(5, 10, 30, 50)) {
sample_means <- replicate(1000, mean(sample(populasi, n, replace = TRUE)))
hist(sample_means,
main = paste("Sample Size =", n),
xlab = "Sample Mean",
col = "lightgreen",
breaks = 20,
probability = TRUE)
curve(dnorm(x, mean = mean(populasi), sd = sd(populasi)/sqrt(n)),
add = TRUE, col = "red", lwd = 2)
}
R
μ_X̄ = 50.0000
σ_X̄ = 1.6667
P(45 < X̄ < 55) = 0.9973
[1] 0.05950834
[1] 0.967356
[1] 2.144787
[1] -2.62249002 -4.15406058 0.84861094 0.19334457 0.66940683 0.57401384
[7] 0.56665056 0.05276854 -0.51014017 -0.81351881 1.08366796 -0.63582645
[13] -0.30269643 -0.36414360 2.70317304 -0.71835290 -0.49096799 0.67749892
[19] -0.12060391 -1.28350798 0.94712489 -0.76391289 1.97659297 -0.47178298
[25] -0.23250629 0.55180275 -0.92795831 1.68531129 0.79484392 -0.21530552
[31] 0.73522088 2.58144307 0.02351631 -1.45275040 -2.71785259 -0.64265257
[37] -1.05466980 -0.48310054 2.13913209 -0.61807360 -0.14338389 -1.23879954
[43] -1.16364309 0.79165505 0.95575103 -0.21311130 -2.09228500 0.99684543
[49] 0.92379512 4.75640219 -0.15232457 -0.34448416 -0.28409469 0.14944620
[55] 0.50741149 -1.41303840 0.15097845 0.55426210 -2.14923488 -0.78741090
[61] -0.77821930 0.33839129 -2.33949419 0.69374910 0.06553566 0.17039155
[67] 0.04655677 0.10147598 -0.68217637 -0.13113890 1.44278813 -1.08139378
[73] -1.27419142 2.82360289 0.33074624 0.62801599 -0.58281145 0.07018520
[79] -0.15619497 0.52467228 -0.81856676 -0.81316681 -1.69441437 -0.27579374
[85] -0.24511727 -1.09968646 0.54965428 -0.12506035 0.12225031 -0.23304499
[91] 0.43738080 -0.11189371 -1.87866487 0.31271523 1.06313345 2.65770332
[97] 0.59661662 2.10665958 -0.66862829 1.20781391
# Visualisasi: Perbandingan t vs Normal
x <- seq(-4, 4, length = 200)
plot(x, dnorm(x), type = "l", lwd = 2, col = "blue",
main = "Perbandingan Distribusi t vs Normal",
xlab = "x", ylab = "Density")
lines(x, dt(x, df = 5), col = "red", lwd = 2)
lines(x, dt(x, df = 14), col = "green", lwd = 2)
legend("topright",
legend = c("Normal", "t(df=5)", "t(df=14)"),
col = c("blue", "red", "green"), lwd = 2)
R
Taksiran mean mpg: 20.0906
Taksiran variansi mpg: 36.3241
Proporsi mobil matic: 0.5938
Proporsi mobil manual: 0.4062
R
# Data
x_bar <- mean(mtcars$mpg)
sigma <- 6 # variansi diketahui
n <- nrow(mtcars)
alpha <- 0.05
# CI 95%
z_critical <- qnorm(1 - alpha/2)
margin_error <- z_critical * sigma / sqrt(n)
lower <- x_bar - margin_error
upper <- x_bar + margin_error
cat(sprintf("CI 95%% untuk mean (σ diketahui): (%.4f, %.4f)\n", lower, upper))CI 95% untuk mean (σ diketahui): (18.0118, 22.1695)
R
# Data
x_bar <- mean(mtcars$mpg)
s <- sd(mtcars$mpg)
n <- nrow(mtcars)
alpha <- 0.05
# CI 95% menggunakan t-distribution
t_critical <- qt(1 - alpha/2, df = n-1)
margin_error <- t_critical * s / sqrt(n)
lower <- x_bar - margin_error
upper <- x_bar + margin_error
cat(sprintf("CI 95%% untuk mean (σ tidak diketahui): (%.4f, %.4f)\n", lower, upper))CI 95% untuk mean (σ tidak diketahui): (17.9177, 22.2636)
R
data(iris)
# Data
x1 <- iris$Sepal.Length[iris$Species == "setosa"]
x2 <- iris$Sepal.Length[iris$Species == "versicolor"]
mean1 <- mean(x1)
mean2 <- mean(x2)
var1 <- var(x1)
var2 <- var(x2)
n1 <- length(x1)
n2 <- length(x2)
# Variansi pooled
var_pooled <- ((n1-1)*var1 + (n2-1)*var2) / (n1 + n2 - 2)
sd_pooled <- sqrt(var_pooled)
alpha <- 0.05
# CI 95%
diff <- mean2 - mean1
t_critical <- qt(1-alpha/2, df = n1+n2-2)
margin_error <- t_critical * sd_pooled * sqrt(1/n1 + 1/n2)
lower <- diff - margin_error
upper <- diff + margin_error
cat(sprintf("CI 95%% untuk beda mean: (%.4f, %.4f)\n", lower, upper))CI 95% untuk beda mean: (0.7546, 1.1054)
R
data(sleep)
# Hitung perbedaan
diff <- with(sleep, extra[group == 2] - extra[group == 1])
mean_diff <- mean(diff)
sd_diff <- sd(diff)
n <- length(diff)
alpha <- 0.05
# CI 95%
t_critical <- qt(1-alpha/2, df = n-1)
margin_error <- t_critical * sd_diff / sqrt(n)
lower <- mean_diff - margin_error
upper <- mean_diff + margin_error
cat(sprintf("CI 95%% untuk beda mean (paired): (%.4f, %.4f)\n", lower, upper))CI 95% untuk beda mean (paired): (0.7001, 2.4599)
R
# H0: μ = 250
# H1: μ ≠ 250
# α = 0.05
mu_0 <- 250
alpha <- 0.05
# Statistik uji
x_bar <- mean(mtcars$disp)
s <- sd(mtcars$disp)
n <- nrow(mtcars)
Z <- (x_bar - mu_0) / (s / sqrt(n))
# Daerah penolakan
Z_lower <- qnorm(alpha/2)
Z_upper <- qnorm(1 - alpha/2)
# P-value
p_value <- 2 * (1 - pnorm(abs(Z)))
# Keputusan
keputusan <- ifelse(Z < Z_lower | Z > Z_upper, "H0 ditolak", "H0 diterima")
cat(sprintf("Statistik uji Z = %.4f\n", Z))Statistik uji Z = -0.8799
Daerah penolakan: Z < -1.9600 atau Z > 1.9600
P-value = 0.378914
Keputusan: H0 diterima
R
Hasil Uji-t:
One Sample t-test
data: mtcars$disp
t = -0.8799, df = 31, p-value = 0.3857
alternative hypothesis: true mean is not equal to 250
95 percent confidence interval:
186.0372 275.4065
sample estimates:
mean of x
230.7219
R
Hasil Uji-t (Variansi Sama):
Two Sample t-test
data: mtcars$disp[mtcars$am == 0] and mtcars$disp[mtcars$am == 1]
t = 4.0152, df = 30, p-value = 0.0003662
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
72.15611 221.54025
sample estimates:
mean of x mean of y
290.3789 143.5308
Hasil Uji-t (Variansi Berbeda):
Welch Two Sample t-test
data: mtcars$disp[mtcars$am == 0] and mtcars$disp[mtcars$am == 1]
t = 4.1977, df = 29.258, p-value = 0.00023
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
75.32779 218.36857
sample estimates:
mean of x mean of y
290.3789 143.5308
R
# Membuat data simulasi
set.seed(123)
pops.1 <- rnorm(n = 10, mean = 1, sd = 1)
pops.2 <- rnorm(n = 10, mean = 0.9, sd = 1)
pops.3 <- rnorm(n = 10, mean = 1.1, sd = 1)
pop.source <- c(rep("pops.1", 10), rep("pops.2", 10), rep("pops.3", 10))
pops.value <- c(pops.1, pops.2, pops.3)
data_anova <- data.frame(pop.source, pops.value)
# Uji ANOVA
# H0: μ1 = μ2 = μ3
# H1: setidaknya satu mean berbeda
result_anova <- aov(pops.value ~ pop.source, data = data_anova)
summary(result_anova) Df Sum Sq Mean Sq F value Pr(>F)
pop.source 2 1.16 0.5802 0.61 0.551
Residuals 27 25.68 0.9512
F-tabel = 3.3541
R
Hasil Uji Chi-Squared:
Chi-squared test for given probabilities
data: obs
X-squared = 5.6, df = 5, p-value = 0.3471
Chi-squared tabel = 11.0705
R
# Menguji independensi treatment dan improvement
treatment_url <- "https://raw.githubusercontent.com/selva86/datasets/master/treatment.csv"
treatment <- read.csv(treatment_url)
# Tabel kontingensi
tabel <- table(treatment$treatment, treatment$improved)
# Uji Chi-squared
result <- chisq.test(tabel)
print(result)Menghitung Korelasi:
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
Korelasi disp dan wt: 0.8880
Visualisasi Korelasi (Heatmap):
R

Uji Korelasi:
R
Hasil Uji Korelasi:
Pearson's product-moment correlation
data: mtcars$disp and mtcars$wt
t = 10.576, df = 30, p-value = 1.222e-11
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7811586 0.9442902
sample estimates:
cor
0.8879799
Visualisasi dengan ggscatter:
Membuat Model Regresi:
R
Call:
lm(formula = wt ~ disp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.89044 -0.29775 -0.00684 0.33428 0.66525
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.5998146 0.1729964 9.248 2.74e-10 ***
disp 0.0070103 0.0006629 10.576 1.22e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4574 on 30 degrees of freedom
Multiple R-squared: 0.7885, Adjusted R-squared: 0.7815
F-statistic: 111.8 on 1 and 30 DF, p-value: 1.222e-11
Koefisien Regresi:
R
Persamaan regresi: wt = 1.5998 + 0.007010 × disp
Interpretasi:
- Intercept (β₀) = 1.5998
- Slope (β₁) = 0.007010
- Setiap kenaikan 1 unit displacement, rata-rata weight naik 0.007010 unit
Visualisasi Regresi:
R
# Base R
plot(mtcars$disp, mtcars$wt,
main = "Weight vs Displacement",
xlab = "Displacement",
ylab = "Weight",
pch = 19, col = "blue")
abline(model, col = "red", lwd = 2)
# Menambahkan equation ke plot
legend("topleft",
legend = sprintf("wt = %.2f + %.4f × disp", beta_0, beta_1),
bty = "n", cex = 0.9)
Dengan ggplot2:
R
ggplot(mtcars, aes(x = disp, y = wt)) +
geom_point(size = 3, color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red") +
labs(title = "Linear Regression: Weight vs Displacement",
x = "Displacement (cu.in.)",
y = "Weight (1000 lbs)",
subtitle = sprintf("wt = %.2f + %.4f × disp", beta_0, beta_1)) +
theme_minimal()`geom_smooth()` using formula = 'y ~ x'

Diagnostik Model:

Goodness of Fit:
R-squared: 0.7885
Adjusted R-squared: 0.7815
Interpretasi: 78.85% variasi weight dijelaskan oleh displacement
Prediksi:
R
Prediksi Weight untuk Displacement baru:
disp fit lwr upr
1 150 2.651363 1.696444 3.606283
2 200 3.001880 2.052322 3.951437
3 250 3.352396 2.403390 4.301401
Luar biasa! Anda telah menyelesaikan review komprehensif materi R untuk Pengantar Sains Data. Mari kita rangkum perjalanan pembelajaran Anda:
Keterampilan yang Telah Dikuasai:
Selamat! Anda telah menguasai toolkit fundamental R untuk data science. Terus berlatih dan jangan takut untuk bereksperimen! 🎉