R包开发:xmetrics

1 缘由

R包xmetrics定位于辅助计量经济学和统计学课程进行高效的、可重现的(reproducable)教学内容准备和演示。

R包命名的原则应该坚持:a.词形或发音与表意一致性,也即与开发R包核心功能定位的一致;b.不能太范化或通用,避免后期用户搜索的混淆,或推广传播的混乱。

拟定开发的R包命名xmetrics,音近于计量经济学英文EconometricsX也有多种用途或神通广大的含义,而且metrics本身就是测量或度量的含义。通过google搜索关键词“xmetrics”的重合度或范化度,仅发现一款游泳穿戴设备名为“Xmetrics”(见网页)。继续google搜索关键词“R xmetrics”,发现一个定位为机器学习(Machine Learning)的R包命名为“Metrics”(见CRAN)。

功能类似R包主要有: equatiomatic(见github仓库datalorax/equatiomatic

The goal of equatiomatic is to reduce the pain associated with writing LaTeX code from a fitted model. In the future, the package aims to support any model supported by broom.

  • 优点:简单、容易上手;支持较多的几类计量模型。

  • 不足:功能相对较少(见下面)

2 开发思路

2.1 主要功能

math equation输出需要考虑的几个方面:

  • 支持纯latex符号公式(symbol equation)、模型结果数值化(value equation)以及二者的混合;

  • 支持计量经济学数学公式(math equation)的各类理论表达,如总体回归模型PRM、总体回归函数PRF、样本回归模型(SRM)、样本回归函数(SRF)等;

  • 支持多种数值结果形式输出,例如经典三行式(系数、标准误、t值),或者经典一行式(系数)。

  • 支持多种latex美化效果,包括alignalignedat等排列对齐环境;

  • 支持灵活个性化的参数符号(如\(\alpha,\beta,\gamma,\cdots\))和下标符号(如\(X_i, u_i, u_t\))选择等;

  • 支持多种模型估计方法,包括OLS回归、虚拟变量回归(ANOVA)等;

  • 与通用R包保持接口一致,如包broom等;

2.2 主要挑战

计量经济学的语言符号体系

  • 符号体系的标准化和一致性问题【低难度】

  • 数学证明和逻辑推导问题【工作强度大】。

2.2.1 理论公式输出

X <- c(paste0(rep(c("X","Z"),each=4),1:4), "fathedu", "mothedu")
Y <- "lwage"
Greek.g <- c("alpha","beta","lambda")
Greek.n <- c(4,4,2)
#type <- "srm"
Obs <- "i"
N.row <- 4
Cst <- F

总体回归模型PRM:

\[\begin{equation} \begin{alignedat}{999} lwage_i&=&&\alpha_{1}X1_i+&&\alpha_{2}X2_i+&&\alpha_{3}X3_i+&&\alpha_{4}X4_i\\&+&&\beta_{1}Z1_i+&&\beta_{2}Z2_i+&&\beta_{3}Z3_i+&&\beta_{4}Z4_i\\&+&&\lambda_{1}fathedu_i+&&\lambda_{2}mothedu_i+&&u_i \end{alignedat} \tag{2.1} \end{equation}\]

样本回归模型SRM:

srm_test <- lx.psm(x =X, y = Y, greek.g = Greek.g, greek.n = Greek.n,
       type = "srm", intercept = Cst, lm.label = "srm", 
       obs = Obs, n.row = N.row)

\[\begin{equation} \begin{alignedat}{999} lwage_i&=&&\hat{\alpha}_{1}X1_i+&&\hat{\alpha}_{2}X2_i+&&\hat{\alpha}_{3}X3_i+&&\hat{\alpha}_{4}X4_i\\&+&&\hat{\beta}_{1}Z1_i+&&\hat{\beta}_{2}Z2_i+&&\hat{\beta}_{3}Z3_i+&&\hat{\beta}_{4}Z4_i\\&+&&\hat{\lambda}_{1}fathedu_i+&&\hat{\lambda}_{2}mothedu_i+&&e_i \end{alignedat} \tag{2.2} \end{equation}\]

样本回归函数SRF:

srf_test <- lx.psm(x =X, y = Y,greek.g = Greek.g, greek.n = Greek.n,
       type = "srf", intercept = Cst, lm.label = "srf",
       obs = Obs, n.row = N.row )

\[\begin{equation} \begin{alignedat}{999} \widehat{lwage}_i&=&&\hat{\alpha}_{1}X1_i+&&\hat{\alpha}_{2}X2_i+&&\hat{\alpha}_{3}X3_i+&&\hat{\alpha}_{4}X4_i\\&+&&\hat{\beta}_{1}Z1_i+&&\hat{\beta}_{2}Z2_i+&&\hat{\beta}_{3}Z3_i+&&\hat{\beta}_{4}Z4_i\\&+&&\hat{\lambda}_{1}fathedu_i+&&\hat{\lambda}_{2}mothedu_i \end{alignedat} \tag{2.3} \end{equation}\]

2.2.2 数值公式输出

require("wooldridge")
mroz <- wooldridge::mroz %>%
  as_tibble() %>%
  select(lwage, educ,exper, 
         fatheduc,motheduc,everything()) %>%
  filter(!is.na(wage))

mod_origin <- formula(lwage ~ educ + nwifeinc +exper+I(exper^2) + I(exper^2*city)  )

ols_origin <- lm(formula = mod_origin, data = mroz)

默认形式:

lx_out<- lx.est(lm.mod = mod_origin, lm.dt = mroz)

\[\begin{alignedat}{999} \widehat{lwage}&=&&-0.53&&+0.10educ_i&&+0.01nwifeinc_i\\&(s)&&0.2011&&0.0148&&0.0032\\&(t)&&-2.61&&+6.67&&+1.59\\&(cont.)&&+0.04exper_i&&-0.00exper^2_i&&+0.00exper^2*city_i\\&(s)&&0.0132&&0.0004&&0.0002\\&(t)&&+3.23&&-2.19&&+0.79 \end{alignedat}\]

srm形式:

lx_out<- lx.est(lm.mod = mod_origin, lm.dt = mroz, style = "srm")

\[\begin{alignedat}{999} {lwage}&=&&-0.53&&+0.10educ_i&&+0.01nwifeinc_i\\&(s)&&0.2011&&0.0148&&0.0032\\&(t)&&-2.61&&+6.67&&+1.59\\&(cont.)&&+0.04exper_i&&-0.00exper^2_i&&+0.00exper^2*city_i&&+e_i\\&(s)&&0.0132&&0.0004&&0.0002\\&(t)&&+3.23&&-2.19&&+0.79 \end{alignedat}\]

一行形式:

lx_out<- lx.est(lm.mod = mod_origin, lm.dt = mroz, style = "srm", opt = c("p"))

\[\begin{alignedat}{999} {lwage}&=&&-0.53&&+0.10educ_i&&+0.01nwifeinc_i\\&(p)&&0.0093&&0.0000&&0.1116\\&(cont.)&&+0.04exper_i&&-0.00exper^2_i&&+0.00exper^2*city_i&&+e_i\\&(p)&&0.0014&&0.0288&&0.4322 \end{alignedat}\]

3 一些工具函数

3.1 将xls文件高保真地转换为xlsx文件

参考资料1:geosalz 源代码;参考资料2:“队长问答”;参考资料3:博客文章

函数作用:将本地文件夹下的.xls文件批量转换为.xlsx文件。适用于windowns操作系统下,具体会调用Microsoft的本地电脑程序端。

函数名称:convert_xls_as_xlsx(input_dir, export_dir)

使用场景:“D:/github/article-west/R/xls2xlsx.R”;以及“D:/github/article-west/data-set-maintain.Rmd”

convert_xls_as_xlsx(input_dir = "d:/github/article-west/data/v4-cost-revenue/01-raw/",  
                    export_dir = "d:/github/article-west/data/v4-cost-revenue/001-out/")

注意可能的提示(message):

Found 2 versions of 'excelcnv.exe':
  C:/Program Files/Microsoft Office/Updates/Download/PackageFiles/8BB798B7-EFF4-4781-AD0F-DE53892ADC7D/root/Office16/excelcnv.exe
  C:/Program Files/Microsoft Office/root/Office16/excelcnv.exe

根据本地电脑的实际情况,很可能需要修改两个地方:

  • 本地电脑office的安装路径:safe_office_folder()函数的路径参数office_path = "C:/Program Files/Microsoft Office")

  • 可能有office更新版本:get_excelcnv_exe()函数的输出结果paths[2]

3.2 df数据列元素进行快速粘合输出

Related