分类: R语言

  • R语言玩音乐

    使用帮助信息,访问:readme

    ---
    title: "R Notebook"
    output: html_notebook
    ---
    ```{r}
    # 加载gm包
    library(gm)
    # 生成乐谱数据
    # Create a flute 乐器74 长笛
    flute <- Instrument(74, pan = -90)
    flute
    # Create a tempo
    tempo <- Tempo(60, marking = "Adagio (half = 25)")
    tempo
    #music 
    music <- 
      Music() +
      Meter(4, 4) +  # 4/4拍
      Line(c("E5","E5","E5","G5","A5", "G5", "c5", "D5")) + flute + tempo  # 音符序列
    music
    
    # 展示乐谱(生成MP3音乐、可调用MuseScore渲染并打开)
    export(music,"~/x.mp3","musescore")
    export(music,"~/x.mscz","musescore")
    
    #打开网页、musescore 编辑查看乐曲
    show(music,musescore = "-r 800 -T 5")
    
    #vignette("gm") 显示帮助信息
    
    ```

    更多示例代码:

    ---
    title: "R Notebook"
    output:
      pdf_document: default
      html_notebook: default
    ---
    # R语言gm音乐包基本语法
    ```{r}
    # 加载gm包
    library(gm)
    # 生成乐谱数据
    # Create a flute 乐器74 长笛
    flute <- Instrument(41, pan = -90)
    flute
    slur <- Slur(3, 8) #连音线
    # Create a tempo
    tempo <- Tempo(60, marking = "Adagio (half = 25)")
    notehead <- Notehead(1, shape = "diamond", color = "#800080")
    tie <- Tie(1) #连音 
    line <- Line(
      pitches = c("E5","E5","E5","G5","A5", "G5", "c5","D5","e5","c5","g4","a4","c5","a4","g4","c-5","g#4"),
      durations = c(1, 1, 1,1,0.5,0.5 )
    )
    #music 
    music <- 
      Music() +
      Meter(3, 4) +  # 4/4拍
      line + #声调 -降调
      flute + 
      tempo +
      notehead  +
      slur +       # lianyinfu
      tie          # 音符序列
    
    # 展示乐谱(生成MP3音乐、可调用MuseScore渲染并打开)
    export(music,"~/x.mp3","musescore")
    export(music,"~/x.mscz","musescore")
    
    #打开网页、musescore 编辑查看乐曲
    show(music,musescore = "-r 800 -T 5")
    
    #vignette("gm") 显示帮助信息
    
    ```
    # example1
    ```{r}
    # 假设 gm 支持字符串解析(否则需手动转换)
    notes <- c("C4", "E4", "G4", "C4","E4","E4","E4","G4","A4", "G4")
    # 转换为 MIDI 编号(手动映射)
    instrument= Instrument(77)
    note_to_midi <- function(note) {
      notes_map <- list(
        C4 = 60, Cs4 = 61, D4 = 62, Ds4 = 63, E4 = 64, F4 = 65,
        Fs4 = 66, G4 = 67, Gs4 = 68, A4 = 69, As4 = 70, B4 = 71
      )
      notes_map[[note]]
    }
    pitches <- sapply(notes, note_to_midi)
    pitches
    music<- Music() +
            Meter(3, 4) +   # 4/4拍
            Line(pitches)+
            instrument
    
    show(music)
    ```
    # example2 混合时值,不同乐器
    ```{r}
    pitches <-c(67, 67, 67, 62, 65, 67, 69, 67)
    durations <- c(0.5, 0.5, 1, 2, 0.5, 0.5, 1, 2)   # 八分+八分+四分 / 二分 / ...
    tempo = Tempo(120)
    velocity = Velocity(100)  #力度
    riff <-Music()+ 
      Meter(3,4)+
      Line(pitches,durations) +
      tempo+ 
      velocity
    
    export(riff, "~\\riff.mid")
    show(riff)
    ```
    # example3 简单旋律
    
    ```{r}
    library(gm)
    
    # 定义时值(全部为四分音符)
    instrument = Instrument(41) #小提琴
    tempo= Tempo(90)
    line <- Line(
      pitches = c(60, 62, 64, 65, 67, 69, 71, 72,73,69, 71, 72,73) , 
      durations = c(1, 1, 1.5, 0.5,1)
    )
    # 创建音乐线条
    melody <- 
      Music() +
      Meter(4, 4) +   # 4/4拍
      line +
      instrument + 
      tempo          # 音符序列
    
    # 播放或导出
    #show(melody,musescore = "-r 800 -T 5") 
    show(melody) # 如果有音频输出支持
    export(melody, "~/scale.mid","musescore")
    
    ```
    
    
    
  • R for Data Science (2e)

    R for Data Science (2e) R语言数据科学第二版

    《R for Data Science》旧版

  • R语言plot函数绘图添加网格线

    R语言的plot函数没有内置的grid参数,但可以通过添加grid函数或使用其他方法实现类似功能。 ‌

    添加网格线的方法
    使用grid()函数‌
    grid()函数默认在图表中添加刻度线,可通过参数调整样式(如线宽、颜色、类型):
    R
    Copy Code
    plot(x, y, xlim=c(0,100), ylim=c(0,1), type=”o”)
    grid(lwd=2, col=”lightgray”, lty=”dotted”)
    “` ‌‌:ml-citation{ref=”3″ data=”citationList” data=”citationList”}

    设置坐标轴刻度‌
    通过:ml-search-more[xaxs]{text=”R语言xaxs参数”}和:ml-search-more[yaxs]{text=”R语言yaxs参数”}参数调整坐标轴刻度,例如:
    R
    Copy Code
    plot(x, y, yaxs=”i”, xaxs=”i”)
    “` ‌‌:ml-citation{ref=”1″ data=”citationList” data=”citationList”}

    第三方包或函数‌
    若需更复杂的网格控制,可使用:ml-search-more[ggplot2]{text=”ggplot2″}或:ml-search-more[car包]{text=”car包”}的:ml-search-more[scatterplot]{text=”car包scatterplot函数”}函数,例如:
    R
    Copy Code
    library(car)
    scatterplot(x, y, grid=FALSE)
    “` ‌‌:ml-citation{ref=”4″ data=”citationList” data=”citationList”}

  • R绘制股票日波动线图 中国海油600938

    R代码

    # Set the working directory
    setwd("C:/Users/czliu/Documents/python")
    
    # Read the CSV file
    df <- read.csv("stock_data_numeric.csv")
    colnames(df)
    
    # View the first few rows of the data
    head(df)
    
    # 处理缺失值
    # df <- read.csv("data.csv", na.strings = c("NULL", "?"))  # 读取数据
    df <- na.omit(df)  # 移除缺失值
    
    # 转换日期列(确保列名正确,这里是小写的'date')
    df$date <- as.Date(df$date)  # 假设原始日期列名为'date'
    
    # 绘制线图(使用正确的列名'date',与上面保持一致)
    plot(df$close ~ df$date,  # 这里将'Date'改为'date'
         type = "l",
         main = "Stock Price Daily Change",
         xlab = "Date",
         ylab = "Close Price",
         ylim = c(min(df$close), max(df$close))
    )

    图形添加均线,25%,median,75%,四条线,代码:

    # Set the working directory
    setwd("C:/Users/czliu/Documents/python")
    
    # Read the CSV file
    df <- read.csv("stock_data_numeric.csv")
    colnames(df)
    
    # View the first few rows of the data
    head(df)
    
    # 处理缺失值
    # df <- read.csv("data.csv", na.strings = c("NULL", "?"))  # 读取数据
    df <- na.omit(df)  # 移除缺失值
    
    # 转换日期列(确保列名正确,这里是小写的'date')
    df$date <- as.Date(df$date)  # 假设原始日期列名为'date'
    
    # 绘制线图
    plot(df$close ~ df$date,
         type = "l",
         main = "Stock Price Daily Change",
         xlab = "Date",
         ylab = "Close Price",
         ylim = c(min(df$close), max(df$close))
    )
    
    # 添加多条水平刻度线--Advanced
    
    # 1. 平均值 - 红色虚线
    mean_price <- mean(df$close)
    abline(h = mean_price, col = "red", lwd = 3, lty = 2)
    
    # 2. 中位数 - 蓝色实线
    median_price <- median(df$close)
    abline(h = median_price, col = "blue", lwd = 3, lty = 1)
    
    
    # 3. 上下四分位数 - 黑色点线
    q1_price <- quantile(df$close, 0.25)
    q3_price <- quantile(df$close, 0.75)
    abline(h = q1_price, col = "black", lwd = 2, lty = 3)
    abline(h = q3_price, col = "black", lwd = 2, lty = 3)
    
    # 添加图例说明
    legend("topright", 
           legend = c(paste("Mean: ", round(mean_price, 2)),
                     paste("Median: ", round(median_price, 2)),
                     paste("Q1: ", round(q1_price, 2)),
                     paste("Q3: ", round(q3_price, 2))),
           col = c("red", "blue", "black", "black"),
           lwd = c(2, 2, 1, 1),
           lty = c(2, 1, 3, 3),
           cex = 0.8)
    
  • R 中读取 CSV 文件并处理缺失值

    在 R 中读取 CSV 文件并处理缺失值是数据清洗的重要步骤。以下是具体方法和代码示例:

    1. 读取 CSV 文件并识别缺失值
      使用 read.csv() 函数读取文件时,可以通过 na.strings 参数指定哪些值应被识别为缺失值(NA)。

    示例:读取 CSV 文件并将 “NULL” 和 “?” 识别为缺失值

    data <- read.csv(“data.csv”, na.strings = c(“NULL”, “?”))

    1. 检查缺失值
      可以使用以下方法检查数据中的缺失值:

    检查每列的缺失值数量

    colSums(is.na(data))

    检查数据框中是否存在缺失值

    anyNA(data)

    1. 处理缺失值
      根据需求,可以选择删除或填充缺失值。
      (1) 删除缺失值
      删除包含缺失值的行:
      data_cleaned <- na.omit(data)
      删除特定列的缺失值:
      data_cleaned <- data[!is.na(data$column_name), ]
      (2) 填充缺失值
      用均值填充:
      2025/10/12 19:3
      Bing Writer • Generated with AI 2/2
      data$column_name[is.na(data$column_name)] <- mean(data$column_name, na.rm = TRUE)
      用中位数填充:
      data$column_name[is.na(data$column_name)] <- median(data$column_name, na.rm = TRUE)
      用特定值填充:
      data$column_name[is.na(data$column_name)] <- 0 # 或其他值
    2. 替换特定值为 NA
      如果数据中某些值(如 999)表示缺失值,可以将其替换为 NA:
      data[data == 999] <- NA
    3. 保存清洗后的数据
      将处理后的数据保存为新的 CSV 文件:
      write.csv(data_cleaned, “cleaned_data.csv”, row.names = FALSE)
      通过这些方法,你可以灵活地处理 CSV 文件中的缺失值,确保数据质量适合后续分析
  • R语言方差分析作图

    # 加载必要的包
    library(dplyr)
    library(ggplot2)
    library(car)
    library(effects)
    library(multcomp)
    library(gridExtra)
    
    # 设置随机种子以确保结果可重现
    set.seed(123)
    
    # ----------------------------
    # 单因素方差分析绘图示例
    # ----------------------------
    
    # 使用内置的PlantGrowth数据集
    data(PlantGrowth)
    
    # 1. 箱线图:展示各组分布
    p1 <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
      geom_boxplot(fill = "lightblue", color = "black") +
      theme_minimal() +
      labs(
        title = "箱线图:不同处理组的植物重量",
        x = "处理组",
        y = "重量"
      )
    
    # 2. 均值图 + 置信区间
    summary_data <- PlantGrowth %>%
      group_by(group) %>%
      summarize(
        mean = mean(weight),
        sd = sd(weight),
        n = n(),
        se = sd / sqrt(n),
        lower = mean - qt(0.975, df = n - 1) * se,
        upper = mean + qt(0.975, df = n - 1) * se
      )
    
    p2 <- ggplot(summary_data, aes(x = group, y = mean, group = 1)) +
      geom_point(size = 3, color = "red") +
      geom_line(linetype = "dashed", color = "darkgrey", linewidth = 0.7) +  # 将size改为linewidth
      geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1, linewidth = 0.8) +  # 将size改为linewidth
      theme_minimal() +
      labs(
        title = "均值图 + 95%置信区间",
        x = "处理组",
        y = "平均重量"
      )
    
    # 3. 散点图 + 抖动
    p3 <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
      geom_jitter(width = 0.2, alpha = 0.6, color = "blue") +
      stat_summary(fun = "mean", geom = "point", color = "red", size = 4) +
      theme_minimal() +
      labs(
        title = "散点图 + 均值点",
        x = "处理组",
        y = "重量"
      )
    
    # ----------------------------
    # 双因素方差分析绘图示例
    # ----------------------------
    
    # 创建示例数据集
    n_per_group <- 10  # 每组样本量
    
    # 创建因子组合
    mydata <- expand.grid(
      gender = factor(c("Male", "Female")),
      treatment = factor(c("Control", "Low", "High"))
    )
    
    # 为每组定义均值(包含主效应和交互效应)
    means_matrix <- matrix(
      c(50, 60, 75,  # 男性在各处理水平下的均值
        55, 75, 70), # 女性在各处理水平下的均值
      nrow = 2, byrow = TRUE,
      dimnames = list(c("Male", "Female"), c("Control", "Low", "High"))
    )
    
    # 生成数据
    mydata$mean <- as.vector(means_matrix)
    mydata <- mydata[rep(1:nrow(mydata), each = n_per_group), ]
    mydata$score <- mydata$mean + rnorm(nrow(mydata), 0, 8)
    
    # 4. 交互效应图
    model <- aov(score ~ gender * treatment, data = mydata)
    effect_data <- as.data.frame(effect("gender:treatment", model))
    
    p4 <- ggplot(effect_data, aes(x = treatment, y = fit, group = gender, color = gender)) +
      geom_line(linewidth = 1.2) +  # 将size改为linewidth
      geom_point(size = 3) +
      geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1, linewidth = 0.8) +  # 将size改为linewidth
      theme_minimal() +
      labs(
        title = "性别 × 处理 交互效应图",
        x = "处理水平",
        y = "预测分数",
        color = "性别"
      ) +
      scale_color_manual(values = c("Male" = "blue", "Female" = "red"))
    
    # 5. 箱线图:双因素设计
    p5 <- ggplot(mydata, aes(x = treatment, y = score, fill = gender)) +
      geom_boxplot(position = position_dodge(0.8)) +
      theme_minimal() +
      labs(
        title = "双因素箱线图",
        x = "处理水平",
        y = "分数",
        fill = "性别"
      )
    
    # 6. 散点图 + 分组
    p6 <- ggplot(mydata, aes(x = treatment, y = score, color = gender)) +
      geom_jitter(alpha = 0.6, position = position_jitterdodge(jitter.width = 0.2, dodge.width = 0.8)) +
      stat_summary(fun = "mean", geom = "point", size = 4, position = position_dodge(0.8)) +
      theme_minimal() +
      labs(
        title = "双因素散点图",
        x = "处理水平",
        y = "分数",
        color = "性别"
      )
    
    # 7. 残差诊断图
    fit <- aov(weight ~ group, data = PlantGrowth)
    par(mfrow = c(2, 2))
    plot(fit)
    par(mfrow = c(1, 1))
    
    # 显示所有图形
    grid.arrange(p1, p2, p3, p4, p5, p6, ncol = 2)    
    
  • 中国海油600938R语言绘制日、周K线图2025.1~9

    python库baostock获取股票数据 – 网事-树莓派

    ---
    title: "R Notebook"
    output:
      html_document:
        df_print: paged
    ---
    
    ```{r}
    library(readr)
    library(quantmod)
    
    # 读取数据
    stock <- read_csv("stock_data_600938.csv")
    
    # 创建xts对象时只保留数值列和日期列
    # 排除code等非数值列
    stock_xts <- xts(stock[,-c(1,2)],                               # 只保留数值型列
                     order.by = as.Date(stock$date))      # 日期作为索引
    tail(stock_xts)
    
    # 绘制股票价格图
    chartSeries(stock_xts, # 明确指定使用收盘价绘图
                theme = "white", 
                name = "中国海油股价K线图及布林线",
                up.col = "red",dn.col = "green",
                TA = c(addVo(),      # 添加成交量图
                       addBBands())) # 添加SMA20均线
    
    ```
    # 绘制周股票价格图
    ```{r}
    stock_xts_w <- to.weekly(stock_xts)
    tail(stock_xts_w)
    chartSeries(stock_xts_w, # 明确指定使用收盘价绘图
                theme = "white", 
                name = "中国海油股价K线图及布林线",
                up.col = "red",dn.col = "green",
                TA = c(addVo(),      # 添加成交量图
                       addBBands())) # 添加SMA20均线
    ```
                open  high   low close  preclose   volume    amount adjustflag   turn tradestatus  pctChg isST
    2025-09-23 26.12 26.37 26.04 26.11 26.16 33420344 874462608 3 1.1177 1 -0.1911 0
    2025-09-24 26.20 26.66 26.20 26.33 26.11 35694697 941589316 3 1.1938 1 0.8426 0
    2025-09-25 26.34 26.71 26.32 26.60 26.33 44561236 1183035539 3 1.4903 1 1.0254 0
    2025-09-26 26.55 26.69 26.47 26.50 26.60 22192550 589291301 3 0.7422 1 -0.3759 0
    2025-09-29 26.55 26.58 26.21 26.40 26.50 31542708 831091786 3 1.0549 1 -0.3774 0
    2025-09-30 26.16 26.19 25.88 26.13 26.40 51791066 1348014475 3 1.7321 1 -1.0227 0

    stock_xts.Open stock_xts.High stock_xts.Low stock_xts.Close stock_xts.Volume
    2025-08-29 25.75 26.31 25.47 25.68 281303365
    2025-09-05 25.69 26.57 25.41 25.74 309074615
    2025-09-12 25.58 26.36 25.55 26.30 193722674
    2025-09-19 26.30 26.90 26.00 26.40 232149346
    2025-09-26 26.41 26.71 26.04 26.50 163340642
    2025-09-30 26.55 26.58 25.88 26.13 83333774

  • R语言绘制股票K线图

    使用python库baostock获得中国海油的历史数据,存为stock_data_numeric.csv文件,备用。
    python库baostock获取股票数据 – 网事-树莓派

    
    library(readr)
    stock_data_numeric <- read_csv("python/stock_data_numeric.csv")
    View(stock_data_numeric)
    # 安装并加载quantmod包
    #install.packages("quantmod")
    library(quantmod)
    
    stock_data_numeric <- xts(stock_data_numeric[,-c(1,2)],order.by = as.Date(stock_data_numeric$date))
    # 使用chartSeries绘制股票价格图
    chartSeries(stock_data_numeric,
                theme = "white", # 设置白色主题
                name = "中国海油股价及20均价线SMA20",
                TA = c(addVo(),      #添加成交量图
                       addBBands())) # 添加布林带指标  
    
    
  • 《R in Action, Third Edition》简介

    《R in Action, Third Edition》

    围绕R语言与Tidyverse工具集,全面介绍了数据处理、分析及可视化的相关内容。 全书共分5部分22章,附录包含7个补充主题及1个在线 bonus 章节。第一部分为入门知识,涵盖R的介绍、数据集创建、基础和高级数据管理、图形入门等,帮助读者熟悉R环境、数据结构及基本操作。第二部分介绍基础方法,包括基本图形和统计方法,涉及多种图表绘制及描述性统计、相关分析等。第三部分探讨中级方法,如回归、方差分析、功效分析等,深入讲解多种统计模型和图形进阶内容。第四部分涉及高级方法,包括广义线性模型、主成分分析等复杂统计方法。第五部分拓展技能,涵盖高级图形、编程、动态报告创建和包的制作等。 书中强调实用性,通过大量实例讲解如何利用R解决实际数据问题,还介绍了众多R包的使用,适合各层次数据分析师学习。

  •  R 代码用于对mtcars数据集进行分组汇总分析

    library(magrittr)
    library(dplyr)
    mtcars
    mtcars %>%
      group_by(cyl, gear) %>%
      summarise_all(list(mean), na.rm=TRUE)
    
    
    这段R代码主要用于对mtcars数据集进行分组汇总分析,具体功能和意义如下:

    1. 包的加载

    • library(magrittr):加载magrittr包,用于使用管道运算符%>%。该运算符可将前一步的结果“传递”给后一步函数,使代码更简洁(例如A %>% f() %>% g()等价于g(f(A)))。
    • library(dplyr):加载dplyr包,用于数据处理操作(如分组、汇总等)。

    2. 数据查看与处理

    • mtcars:直接输出mtcars数据集(R语言内置的汽车性能数据集,包含32行11列,记录了不同车型的油耗、气缸数、马力等参数)。
    • 管道操作与分组汇总
      mtcars %>%
        group_by(cyl, gear) %>%
        summarise_all(list(mean), na.rm=TRUE)
    • group_by(cyl, gear):按cyl(气缸数)和gear(挡位数量)两个变量对数据分组。例如,将所有4缸、3挡的车型归为一组,6缸、4挡的归为另一组,以此类推。
    • summarise_all(list(mean), na.rm=TRUE):对每个分组内的所有数值型变量(如mpg油耗、disp排量等)计算均值,并忽略缺失值(na.rm=TRUE)。summarise_all会对所有列执行指定函数,list(mean)表示使用mean函数进行汇总。

    3. 代码的核心意义

    • 数据分析目标:通过分组汇总,分析不同气缸数(cyl)和挡位数量(gear)组合下,汽车各项性能指标的平均水平。例如,可以比较4缸3挡车型与6缸5挡车型的平均油耗(mpg)、平均排量(disp)等差异。
    • 代码优势:使用dplyr和管道运算符的组合,使数据处理流程更清晰(分组→汇总),代码可读性强,符合“数据流向”的逻辑(从原始数据到分组结果)。

    4. 预期输出示例

    假设分组后有以下结果:

    cylgearmpg_meandisp_meanhp_mean
    4322.5108.093.0
    4426.0120.5110.0
    6419.2167.6123.0
    8315.8350.0200.0

    该结果可帮助分析不同车型配置的性能特征,为后续统计分析或可视化提供基础。