Pertemuan 8: Uji Cramer von Mises, Friedman, Quade, Kruskal Wallis, Correlation, Regresi Nonparametrik, Regresi Monotonik, Kernel Smoothing, Regresi Spline

Statistika Nonparametrik

Offline di Departemen Matematika

Published

December 3, 2024

Tabel Guide Uji Nonparametrik

Uji Cramer von Mises

Lindsey, Herzberg dan Watts (1987) memberikan data lebar ruas pertama tarsus kedua untuk dua spesies serangga Chaetocnema. Apakah ini menunjukkan perbedaan populasi antara distribusi lebar untuk kedua spesies?

Species A	131	134	137	127	128	118	134	129	131	115
Species B	107	122	144	131	108	118	122	127	125	124

df <- data.frame (Species_A  = c(131,134,137,127,128,118,134,129,131,115),
                  Species_B = c(107,122,144,131,108,118,122,127,125,124)
)
df

   Species_A Species_B
1        131       107
2        134       122
3        137       144
4        127       131
5        128       108
6        118       118
7        134       122
8        129       127
9        131       125
10       115       124

Tujuan: Akan dilakukan pengujian Cramer Von Mises untuk mengetahui apakah ada perbedaan distribusi antar populasi

Hipotesis:

\(H_0: F(x) = G(x)\) for all \(x\) from \(- \infty\) to \(+ \infty\)

\(H_1: F(x) \neq G(x)\) for at least one value of \(x\)

dapat digunakan fungsi cvm_test() dari library twosamples

# install.packages('twosamples')
library(twosamples)

cvm_test(df$Species_A,df$Species_B)

Test Stat   P-Value 
   1.5500    0.0845

Kesimpulannya?

Uji Friedman

Untuk kelompok tujuh siswa, denyut nadi (per menit) diukur sebelum latihan (I), segera setelah latihan (II), dan 5 menit setelah latihan (III). Data diberikan pada di bawah. Gunakan uji Friedman untuk menguji perbedaan antara denyut nadi pada tiga kesempatan.

Dataset_Friedmann <- data.frame(Student = c('A','A','A','B','B','B','C','C','C','D','D','D','E','E','E','F','F','F','G','G','G'),
                                Time = rep(c(1,2,3), 7),
                                Score = c(72, 120, 76, 96, 120, 95, 88, 132, 104, 92, 120, 96, 74, 101, 84, 76, 96, 72, 82,  112, 76))

Tujuan: Akan dilakukan pengujian Friedman untuk mengetahui apakah terdapat perbedaan denyut nadi per menit pada sebelum latihan, setelah latihan, dan 5 menit setelah latihan untuk kelompok 7 siswa tersebut

Hipotesis:

\(H_0:\) All treatments have identical effects

\(H_1:\) At least one treatment yield larger observed values than at least one other treatment

dapat digunakan fungsi friedman.test() dari library stats

friedman.test(y=Dataset_Friedmann$Score, groups=Dataset_Friedmann$Time, blocks=Dataset_Friedmann$Student)


    Friedman rank sum test

data:  Dataset_Friedmann$Score, Dataset_Friedmann$Time and Dataset_Friedmann$Student
Friedman chi-squared = 10.571, df = 2, p-value = 0.005063

Kesimpulannya?

Uji Quade

Uji Quade adalah alternatif bagi Uji Friedman. Kedua pengujian memiliki hipotesis null yang sama.

Misal ingin diuji data dari soal pada section Uji Friedmann menggunakan Uji Quade

dapat digunakan fungsi quade.test() dari library stats

quade.test(Score ~ Time | Student,
           data = Dataset_Friedmann)


    Quade test

data:  Score and Time and Student
Quade F = 10.517, num df = 2, denom df = 12, p-value = 0.002298

Dengan menggunakan fungsi quadeAllPairsTest() dari library PMCMRplus, kita bisa melakukan Quade multiple-comparison test

# install.packages(PMCMRplus)
library(PMCMRplus)

quadeAllPairsTest(y      = Dataset_Friedmann$Score,
                       groups = Dataset_Friedmann$Time,
                       blocks = Dataset_Friedmann$Student,
                       p.adjust.method = "fdr")


    Pairwise comparisons using Quade's test with TDist approximation

data: Dataset_Friedmann$Score, Dataset_Friedmann$Time and Dataset_Friedmann$Student

  1      2     
2 0.0026 -     
3 0.2922 0.0094


P value adjustment method: fdr

Hasil menunjukkan pasangan grup yang berbeda (atau tidak berbeda) secara signifikan

Uji Kruskal Wallis

Lubischew (1962) memberikan pengukuran lebar kepala maksimum dalam satuan 0,01 mm untuk tiga spesies Chaetocnema. Sebagian dari datanya diberikan di bawah ini. Gunakan uji Kruskal–Wallis untuk melihat apakah ada perbedaan lebar kepala untuk ketiga spesies Chaetocnema.

df <- data.frame (Species  = rep(c("Species_1","Species_2","Species_3"),times=c(10,11,8)),
                  lebar = c(53,50,52,50,49,47,54,51,52,57,49,49,47,54,43,51,49,51,50,46,49,58,51,45,53,49,51,50,51))

Hipotesis:

\(H_0:\) All k population distribution functions are identical

\(H_1:\) At least one population yield larger observed values than at least one other population

Atau alternatif \(H_1\) lainnya

\(H_1:\) The k populations do not all have identical means

dapat digunakan fungsi kruskal.test() dari library stats

kruskal.test(lebar ~ Species, data = df)


    Kruskal-Wallis rank sum test

data:  lebar by Species
Kruskal-Wallis chi-squared = 4.436, df = 2, p-value = 0.1088

Kesimpulannya

Correlation

Bardsley dan Chambers (1984) memiliki jumlah sapi ternak (potong) dan domba pada 19 peternakan besar di suatu wilayah. Apakah ada bukti korelasi antara sapi dan domba?

df <- data.frame (Cattle  = c(41,0,42,15,47,0,0,0,56,67,707,368,231,104,132,200,172,146,0),
                  sheep = c(4716,4605,4951,2745,6592,8934,9165,5917,2618,1105,150,2005,3222,7150,8658,6304,1800,5270,1537))

Spearman’s \(\rho\)

Uji korelasi Spearman’s \(\rho\) dapat dilakukan dengan fungsi cor.test() dari library stats dengan method "spearman"

cor.test(df$Cattle, df$sheep, method="spearman")

Warning in cor.test.default(df$Cattle, df$sheep, method = "spearman"): Cannot
compute exact p-value with ties


    Spearman's rank correlation rho

data:  df$Cattle and df$sheep
S = 1517.3, p-value = 0.1663
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.3309864

Kendall’s \(\tau\)

Uji korelasi Kendall’s \(\tau\) dapat dilakukan dengan fungsi cor.test() dari library stats dengan method "kendall"

cor.test(df$Cattle, df$sheep, method="kendall")

Warning in cor.test.default(df$Cattle, df$sheep, method = "kendall"): Cannot
compute exact p-value with ties


    Kendall's rank correlation tau

data:  df$Cattle and df$sheep
z = -1.3786, p-value = 0.168
alternative hypothesis: true tau is not equal to 0
sample estimates:
       tau 
-0.2350464

Regresi Nonparametrik

A driver kept track of the number of miles she traveled and the number of gallons put in the tank each time she bought gasoline.

Miles	Gallons
142	11.1
116	5.7
194	14.2
250	15.8
88	7.5
157	12.5
255	17.9
159	8.8
43	3.4
208	15.2

Miles = c(142, 116, 194, 250, 88, 157, 255, 159, 43, 208)
Gallons = c(11.1, 5.7, 14.2, 15.8, 7.5, 12.5, 17.9, 8.8, 3.4, 15.2)
df = data.frame(Gallons, Miles)
df

   Gallons Miles
1     11.1   142
2      5.7   116
3     14.2   194
4     15.8   250
5      7.5    88
6     12.5   157
7     17.9   255
8      8.8   159
9      3.4    43
10    15.2   208

Library yang akan digunakan:

library(stats)
library(ggplot2)
library(ggpmisc)

A. Draw a diagram showing these points, using gallons as the x-axis

plot(Gallons, Miles, xlab = "Gallons", ylab = "Miles", 
     main = "Gallons x Miles", lwd = 2)

B. Estimate a and b using the method of least squares

model1 = lm(Miles ~ Gallons, data=df)
summary(model1)


Call:
lm(formula = Miles ~ Gallons, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-22.73 -16.26  -7.68  20.46  30.59 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    8.692     19.155   0.454    0.662    
Gallons       13.605      1.585   8.582 2.63e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.6 on 8 degrees of freedom
Multiple R-squared:  0.902, Adjusted R-squared:  0.8898 
F-statistic: 73.64 on 1 and 8 DF,  p-value: 2.626e-05

C. Plot the least squares regression line on the diagram of part A

plot(Gallons, Miles, xlab = "Gallons", ylab = "Miles", 
     main = "Gallons x Miles", lwd = 2)
abline(reg = model1, col = "blue")

D. Suppose the EPA estimated this car’s mileage at 18 miles per gallon. Test the null hypothesis that this figure applies to this particular car and driver. (Use the test for slope)

\(H_0:\) Jarak tempuh mobil (\(\beta\)) adalah 18 mil/galon

\(H_1:\) Jarak tempuh mobil (\(\beta\)) bukan 18 mil/galon

results = list()
beta0 = 18

for (i in 1:length(df$Miles)) {
  results[[i]] <- df$Miles[i] - beta0 * df$Gallons[i]
}
results = unlist(results)
results

 [1] -57.8  13.4 -61.6 -34.4 -47.0 -68.0 -67.2   0.6 -18.2 -65.6

stat_uji = cor(Gallons, results, method = "spearman")
stat_uji

[1] -0.6606061

abs(stat_uji) > 0.6364

[1] TRUE

E. Find a 95% confidence interval for the mileage of this car and driver

# Function to calculate pairwise slopes
calculate_slopes <- function(X, Y) {
  slopes <- c()
  n <- length(X)
  
  for (i in 1:(n - 1)) {
    for (j in (i + 1):n) {
      if (X[j] != X[i]) {  # Avoid division by zero
        S_ij <- (Y[j] - Y[i]) / (X[j] - X[i])
        slopes <- c(slopes, S_ij)
      }
    }
  }
  return(slopes)
}

# Calculate pairwise slopes
slopes <- calculate_slopes(df$Gallons, df$Miles)

# Median slope (Theil-Sen estimator)
median_slope <- median(slopes)

# 95% confidence interval
lower_bound <- quantile(slopes, 0.025)  # 2.5th percentile
upper_bound <- quantile(slopes, 0.975)  # 97.5th percentile

# Output results
list(
  median_slope = median_slope,
  lower_bound = lower_bound,
  upper_bound = upper_bound
)

$median_slope
[1] 14

$lower_bound
     2.5% 
-6.706228 

$upper_bound
   97.5% 
52.65385

Model Alternatif

# Median slope (Theil-Sen estimator)
median_slope <- median(slopes)

Pada metode kalkulasi confidence interval di atas dapat digunakan untuk membentuk suatu estimasi persamaan regresi nonparametrik dengan median Y dan X sebagai estimator intercept dan \(\beta\) secara berturut-turut.

Regresi Monotonik

Metode regresi linier nonparametrik dapat digunakan ketika asumsi regresi linier dapat dipenuhi. Dalam situasi dimana tidak dapat diasumsikan bahwa garis regresi adalah linier tapi dapat diasumsikan bahwa E(Y|X) naik (minimal tidak turun) dengan meningkatnya X. Dalam hal ini dinamakan regresinya naik secara monoton.

Xi	Yi
0.0	30
0.5	30
1.0	30
1.8	28
2.2	24
2.7	19
4.0	17
4.0	9
4.9	12
5.6	12
6.0	6
6.5	8
7.3	4
8.0	5
8.8	6
9.3	4
9.8	6

# Data extracted from the table
data <- data.frame(
  Xi = c(0, 0.5, 1.0, 1.8, 2.2, 2.7, 4.0, 4.0, 4.9, 5.6, 6.0, 6.5, 7.3, 8.0, 8.8, 9.3, 9.8),
  Yi = c(30, 30, 30, 28, 24, 19, 17, 9, 12, 12, 6, 8, 4, 5, 6, 4, 6)
)

data$R_Xi <- rank(data$Xi, ties.method = "average")
data$R_Yi <- rank(data$Yi, ties.method = "average")

n <- nrow(data)
mean_R_Xi <- mean(data$R_Xi)
mean_R_Yi <- mean(data$R_Yi)

# Calculate b2 (slope)
b2 <- sum((data$R_Xi - mean_R_Xi) * (data$R_Yi - mean_R_Yi)) /
      sum((data$R_Xi - mean_R_Xi)^2)

# Calculate a2 (intercept)
a2 <- mean_R_Yi - b2 * mean_R_Xi

# Estimate R(Y) for Given R(X)
data$Rhat_Yi <- a2 + b2 * data$R_Xi

# Convert Rhat_Y back to Y
interpolate_Y <- function(R_hat_Y, Yi, R_Yi) {
  if (R_hat_Y %in% R_Yi) {
    return(Yi[which(R_Yi == R_hat_Y)])  # Return the Yi if the rank is equal
  } else if (R_hat_Y < min(R_Yi)) {
    return(min(Yi))  # Return the largest Yi if the rank is less than the minimum
  } else if (R_hat_Y > max(R_Yi)) {
    return(max(Yi))  # Return the largest Yi if the rank is greater than the maximum
  } else {
    # Find the nearest ranks for interpolation
    lower <- max(R_Yi[which(R_Yi < R_hat_Y)])
    upper <- min(R_Yi[which(R_Yi > R_hat_Y)])
    
    Y_lower <- Yi[which(R_Yi == lower)]
    Y_lower <- Y_lower[1]
    Y_upper <- Yi[which(R_Yi == upper)]
    Y_upper <- Y_upper[1]
    R_lower <- lower
    R_upper <- upper
    
    # Linear interpolation
    return(Y_lower + (R_hat_Y - R_lower) / (R_upper - R_lower) * (Y_upper - Y_lower))
  }
}

data$Yhat_i <- sapply(data$Rhat_Yi, interpolate_Y, Yi = data$Yi, R_Yi = data$R_Yi)

data$Rhat_Xi <- (data$R_Yi - a2)/b2

interpolate_X <- function(R_hat_X, X, R_X) {
  if (R_hat_X <= min(R_X)) {
    return(NULL)
  }
  if (R_hat_X >= max(R_X)) {
    return(NULL)
  }
  
  # Find the nearest ranks for interpolation
  lower <- max(which(R_X <= R_hat_X))
  upper <- min(which(R_X > R_hat_X))
  
  X_lower <- X[lower]
  X_upper <- X[upper]
  R_lower <- R_X[lower]
  R_upper <- R_X[upper]
  
  # Linear interpolation
  return(X_lower + (R_hat_X - R_lower) / (R_upper - R_lower) * (X_upper - X_lower))
}

data$Xhat_i <- sapply(data$Rhat_Xi, function(R_hat) interpolate_X(R_hat, data$Xi, data$R_Xi))
data

    Xi Yi R_Xi R_Yi   Rhat_Yi    Yhat_i   Rhat_Xi    Xhat_i
1  0.0 30  1.0 16.0 16.469939 30.000000  1.503285 0.2516426
2  0.5 30  2.0 16.0 15.536196 29.536196  1.503285 0.2516426
3  1.0 30  3.0 16.0 14.602454 28.602454  1.503285 0.2516426
4  1.8 28  4.0 14.0 13.668712 26.674847  3.645204  1.516163
5  2.2 24  5.0 13.0 12.734969 22.674847  4.716163  2.086465
6  2.7 19  6.0 12.0 11.801227 18.602454  5.787122  2.593561
7  4.0 17  7.5 11.0 10.400613 15.002045  6.858081  3.443671
8  4.0  9  7.5  8.0 10.400613 15.002045 10.070959  5.628384
9  4.9 12  9.0  9.5  9.000000 11.000000  8.464520  4.578712
10 5.6 12 10.0  9.5  8.066258  9.132515  8.464520  4.578712
11 6.0  6 11.0  5.0  7.132515  8.132515 13.283837  7.498686
12 6.5  8 12.0  7.0  6.198773  7.198773 11.141919  6.070959
13 7.3  4 13.0  1.5  5.265031  6.265031 17.032194      NULL
14 8.0  5 14.0  3.0  4.331288  5.665644 15.425756  9.012878
15 8.8  6 15.0  5.0  3.397546  5.198773 13.283837  7.498686
16 9.3  4 16.0  1.5  2.463804  4.642536 17.032194      NULL
17 9.8  6 17.0  5.0  1.530061  4.020041 13.283837  7.498686

Now we can visualize the results

data_clean <- data[!sapply(data$Xhat_i, is.null), ]

combined <- rbind(
  as.matrix(data[, c('Xhat_i', 'Yi')]), 
  as.matrix(data[, c('Xi', 'Yhat_i')])
)

combined <- as.data.frame(combined)
colnames(combined) <- c('X', 'Y')
combined <- combined[!sapply(combined$X, is.null), ]
combined$X <- unlist(combined$X)
combined$Y <- unlist(combined$Y)
combined <- combined[order(combined$X), ]
combined

           X         Y
18 0.0000000 30.000000
1  0.2516426 30.000000
2  0.2516426 30.000000
3  0.2516426 30.000000
19 0.5000000 29.536196
20 1.0000000 28.602454
4  1.5161629 28.000000
21 1.8000000 26.674847
5  2.0864652 24.000000
22 2.2000000 22.674847
6  2.5935611 19.000000
23 2.7000000 18.602454
7  3.4436706 17.000000
24 4.0000000 15.002045
25 4.0000000 15.002045
9  4.5787122 12.000000
10 4.5787122 12.000000
26 4.9000000 11.000000
27 5.6000000  9.132515
8  5.6283837  9.000000
28 6.0000000  8.132515
12 6.0709593  8.000000
29 6.5000000  7.198773
30 7.3000000  6.265031
11 7.4986859  6.000000
15 7.4986859  6.000000
17 7.4986859  6.000000
31 8.0000000  5.665644
32 8.8000000  5.198773
14 9.0128778  5.000000
33 9.3000000  4.642536
34 9.8000000  4.020041

# Scatter plot of original data
plot(data$Xi, data$Yi, pch = 16, col = "blue", xlab = "Xi", ylab = "Yi",
     main = "Monotonic Regression")

lines(combined$X, combined$Y, type = "b", col = "red", pch = 19, lwd = 2)

# Add a legend
legend("topright", legend = c("Original Data", "Fitted Curve"), col = c("blue", "red"), pch = c(16, NA), lty = c(NA, 1))

Kernel Smoothing

#Contoh 1 ---
##Library for plots
library(ggplot2) 
library(ggpubr)

##Entering data and making data in approprite form to do analysis
x <- c(11, 22, 33, 44, 50, 56, 67, 70, 78, 89, 90, 100)
y <- c(2337, 2750, 2301, 2500, 1700, 2100, 1100, 1750, 1000, 1642, 2000, 1932)
df <- data.frame(x, y)

## Nadaraya-Watson bandwiths
plot(df$x, df$y, type = "l")

kernsmooth5 <- ksmooth(df$x, df$y, kernel="normal", bandwidth=5)
lines(kernsmooth5, lwd=2, col='blue')

kernsmooth10 <- ksmooth(df$x, df$y, kernel="normal", bandwidth=10)
lines(kernsmooth10, lwd=2, col='red')

legend("topleft",
       legend = c("b = 5", "b = 10"),
       col = c("blue", "red"),
       lty = 1,
       cex = 0.6)

##Defining both gaussian and epanechinkov kernel functions
gaussian_ker <- function(x){
  val = (1/sqrt(2*pi))*exp(-0.5*x^2)
  return(val)
}

epanech_ker <- function(x){
  ind1  = ifelse((-1 <= x & x<= 1),1,0)
  val = (3/4)*(1-x^2)*ind1
  return(val)
}

##defining bandwidth
h = 10

##fitting
fitted_gaussian <- c()
fitted_epan <- c()

##For loop to find fitted value for both kernel functions
for (i in 1:nrow(df)) {
  x_temp <- df$x[i]
  
  temp_denominator_gauss <- 0
  temp_denominator_epan <- 0
  temp_num_gauss <- 0
  temp_num_epan <- 0
  
  for (k in 1:nrow(df)) {
    temp_denominator_gauss = temp_denominator_gauss + gaussian_ker((x_temp - df$x[k])/h)
    temp_denominator_epan = temp_denominator_epan + epanech_ker((x_temp - df$x[k])/h)
    temp_num_gauss = temp_num_gauss + gaussian_ker((x_temp - df$x[k])/h)*df$y[k]
    temp_num_epan = temp_num_epan + epanech_ker((x_temp - df$x[k])/h)*df$y[k]
  }
  fitted_gaussian[i] = temp_num_gauss/temp_denominator_gauss
  fitted_epan[i] = temp_num_epan/temp_denominator_epan
}

fitted_kern <- ksmooth(df$x, df$y, kernel="normal", bandwidth=h, n.points = nrow(df))
fitted_kern <- as.data.frame(fitted_kern)$y

df$fitted_gaussian <- fitted_gaussian
df$fitted_epan <- fitted_epan
df$fitted_nadaraya <- fitted_kern

##Data frame for all fitted values is given as:
df

     x    y fitted_gaussian fitted_epan fitted_nadaraya
1   11 2337        2472.808    2337.000        2341.990
2   22 2750        2515.947    2750.000        2703.484
3   33 2301        2379.954    2301.000        2553.944
4   44 2500        2148.260    2187.805        2315.508
5   50 1700        2006.372    2036.842        2362.580
6   56 2100        1836.181    1943.902        1892.487
7   67 1100        1518.250    1409.686        1912.977
8   70 1750        1468.214    1370.485        1392.911
9   78 1000        1466.594    1198.529        1196.586
10  89 1642        1682.535    1820.101        1535.150
11  90 2000        1702.334    1821.899        1841.391
12 100 1932        1840.908    1932.000        1930.304

##For plotting
p1 <- ggplot(df, aes(x = x, y = fitted_gaussian)) + geom_point() +
  geom_point(aes(y = y), col = "red")+ geom_line() 

p2 <- ggplot(df, aes(x = x, y = fitted_epan)) + geom_point() +
  geom_point(aes(y = y), col = "red") + geom_line() 

p3 <- ggplot(df, aes(x = x, y = fitted_nadaraya)) + geom_point() +
  geom_point(aes(y = y), col = "red") + geom_line() 

ggarrange(p1, p2, p3, labels  = c("Gaussian", "Epanechnikov", "Nadaraya-Watson"), nrow = 1, ncol = 3)

Regresi Spline

library(splines)

#Spline regression
Day = c(5,10,25,45,70,85,90,100,110,125,130,150,175)
Sales = c(100,125,250,300,350,460,510,460,430,400,370,340,330)
data = data.frame(Day,Sales)

ggplot(data, aes(x=Day, y = Sales)) + geom_point()

model_linear = lm(Sales ~Day, data = data)
summary(model_linear)


Call:
lm(formula = Sales ~ Day, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-139.85  -93.42    3.01   54.87  164.01 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 214.8439    54.4988   3.942   0.0023 **
Day           1.4572     0.5434   2.681   0.0214 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 100.6 on 11 degrees of freedom
Multiple R-squared:  0.3953,    Adjusted R-squared:  0.3403 
F-statistic:  7.19 on 1 and 11 DF,  p-value: 0.02135

plot(Sales~Day)
lines(data$Day, predict(model_linear), col="red")

data

   Day Sales
1    5   100
2   10   125
3   25   250
4   45   300
5   70   350
6   85   460
7   90   510
8  100   460
9  110   430
10 125   400
11 130   370
12 150   340
13 175   330

#Model Spline
model_spline = lm(Sales ~ bs(Day, knots = c(10, 25, 70, 90,100,150)))
summary(model_spline)


Call:
lm(formula = Sales ~ bs(Day, knots = c(10, 25, 70, 90, 100, 150)))

Residuals:
         1          2          3          4          5          6          7 
-2.240e-15 -1.402e-16  7.478e-02 -4.361e-01  2.614e+00 -1.365e+01  1.652e+01 
         8          9         10         11         12         13 
-7.165e+00  1.521e-01  1.027e+01 -9.040e+00  6.679e-01 -1.616e-15 

Coefficients:
                                              Estimate Std. Error t value
(Intercept)                                     100.00      15.33   6.522
bs(Day, knots = c(10, 25, 70, 90, 100, 150))1    10.24      50.85   0.201
bs(Day, knots = c(10, 25, 70, 90, 100, 150))2    32.35      58.15   0.556
bs(Day, knots = c(10, 25, 70, 90, 100, 150))3   297.08      58.00   5.122
bs(Day, knots = c(10, 25, 70, 90, 100, 150))4    90.18      45.73   1.972
bs(Day, knots = c(10, 25, 70, 90, 100, 150))5   430.32      24.37  17.657
bs(Day, knots = c(10, 25, 70, 90, 100, 150))6   299.98      36.95   8.118
bs(Day, knots = c(10, 25, 70, 90, 100, 150))7   264.15      63.51   4.159
bs(Day, knots = c(10, 25, 70, 90, 100, 150))8   200.41      73.64   2.721
bs(Day, knots = c(10, 25, 70, 90, 100, 150))9   230.00      21.68  10.607
                                              Pr(>|t|)    
(Intercept)                                   0.007325 ** 
bs(Day, knots = c(10, 25, 70, 90, 100, 150))1 0.853360    
bs(Day, knots = c(10, 25, 70, 90, 100, 150))2 0.616819    
bs(Day, knots = c(10, 25, 70, 90, 100, 150))3 0.014404 *  
bs(Day, knots = c(10, 25, 70, 90, 100, 150))4 0.143181    
bs(Day, knots = c(10, 25, 70, 90, 100, 150))5 0.000396 ***
bs(Day, knots = c(10, 25, 70, 90, 100, 150))6 0.003907 ** 
bs(Day, knots = c(10, 25, 70, 90, 100, 150))7 0.025282 *  
bs(Day, knots = c(10, 25, 70, 90, 100, 150))8 0.072468 .  
bs(Day, knots = c(10, 25, 70, 90, 100, 150))9 0.001791 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.33 on 3 degrees of freedom
Multiple R-squared:  0.9962,    Adjusted R-squared:  0.9847 
F-statistic: 86.64 on 9 and 3 DF,  p-value: 0.001827

plot(Sales ~ Day)
lines(data$Day, predict(model_spline), col = "red")

quantile(Day, 0.2) #20%

20% 
 33

quantile(Day, 0.4) #40%

40% 
 82

quantile(Day, 0.6) #60%

60% 
102

quantile(Day, 0.8) #80%

80% 
128

model_quantile = lm(Sales ~ bs(Day, knots = c(33,82,102,128)))
summary(model_quantile)


Call:
lm(formula = Sales ~ bs(Day, knots = c(33, 82, 102, 128)))

Residuals:
       1        2        3        4        5        6        7        8 
 12.0957 -22.4151  14.0619   6.0671 -38.8739  13.1709  46.8252 -18.0940 
       9       10       11       12       13 
-25.6499  13.3439   1.8882  -2.6963   0.2764 

Coefficients:
                                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)                              87.90      32.19   2.731 0.041227 *  
bs(Day, knots = c(33, 82, 102, 128))1   130.44      83.55   1.561 0.179240    
bs(Day, knots = c(33, 82, 102, 128))2   170.53      86.33   1.975 0.105201    
bs(Day, knots = c(33, 82, 102, 128))3   316.12      72.86   4.339 0.007437 ** 
bs(Day, knots = c(33, 82, 102, 128))4   426.41      54.24   7.862 0.000535 ***
bs(Day, knots = c(33, 82, 102, 128))5   207.36      79.64   2.604 0.048032 *  
bs(Day, knots = c(33, 82, 102, 128))6   272.37      95.93   2.839 0.036281 *  
bs(Day, knots = c(33, 82, 102, 128))7   241.82      47.16   5.128 0.003683 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 34.45 on 5 degrees of freedom
Multiple R-squared:  0.9677,    Adjusted R-squared:  0.9226 
F-statistic: 21.43 on 7 and 5 DF,  p-value: 0.001915

plot(model_quantile)

Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced