docklet/tools/R_demo.ipynb

214 lines
5.6 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 一个R语言实现的爬虫爬取拉手网美食信息"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"点击下边的cell点击上方工具栏里的执行图标即可执行代码块看到输出结果。代码块左边的In[]出现In[*]表示代码正在执行"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"library(XML)\n",
"\n",
"giveNames = function(rootNode){\n",
" names <- xpathSApply(rootNode,\"//h3/a[@class='goods-name']\",xmlValue)\n",
" names\n",
"}\n",
"\n",
"givesevices = function(rootNode){\n",
" sevices <- xpathSApply(rootNode,\"//h3/a[@class='goods-text']\",xmlValue)\n",
" sevices\n",
"}\n",
"\n",
"\n",
"giveprices = function(rootNode){\n",
" prices <- xpathSApply(rootNode,\"//div/span[@class='price']\",xmlValue)\n",
" prices\n",
"}\n",
"\n",
"\n",
"givemoney = function(rootNode){\n",
" money <- xpathSApply(rootNode,\"//div/span[@class='money']\",xmlValue)\n",
" money\n",
"}\n",
"\n",
"\n",
"giveplaces = function(rootNode){\n",
" places <- xpathSApply(rootNode,\"//a/span[@class='goods-place']\",xmlValue)\n",
" places\n",
"}\n",
"\n",
"\n",
"getmeituan = function(URL){\n",
" Sys.sleep(runif(1,1,2))\n",
" doc<-htmlParse(URL[1],encoding=\"UTF-8\")\n",
" rootNode<-xmlRoot(doc)\n",
" data.frame(\n",
" Names=giveNames(rootNode), #店名\n",
" services=givesevices(rootNode), #服务\n",
" prices=giveprices(rootNode), #现价\n",
" money=givemoney(rootNode), #原价\n",
" places=giveplaces(rootNode) #地点\n",
" \n",
" )\n",
"}\n",
"\n",
"\n",
"URL = paste0(\"http://shenzhen.lashou.com/cate/meishi/page\",1:10)\n",
"\n",
"mainfunction = function(URL){\n",
" data = rbind(\n",
" getmeituan (URL[1]),\n",
" getmeituan (URL[2]),\n",
" getmeituan (URL[3]),\n",
" getmeituan (URL[4]),\n",
" getmeituan (URL[5])\n",
" )\n",
" \n",
" \n",
"}\n",
"ll=mainfunction(URL)\n",
"write.table(ll,\"result.txt\",row.names=FALSE)\n",
"ll\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# R语言的线性回归实例"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"输入数据"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#读入数据\n",
"x<- c(0.10,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.20,0.21,0.23)\n",
"y<-c(42.0,43.5,45.0,45.5,45.0,47.5,49,53,50,55,55,60)\n",
"#绘出 x 与 y 的散列图\n",
"plot(y~x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"执行该段代码,即可看到输出图形;从图中我们可以看出 y 和 x 存在线性相关性,可以进行线性回归分析:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model<-lm(y~x)\n",
"summary(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们通过 P 值(就是上面的 pr 那一列)来查看对应的解释变量 x 的显著性,通过将 p 值与 0.05 进行比较,若改值小于 0.05,就可以说该变量与被解释变量存在显著的相关性。\n",
"\n",
"Multiple R-squared 和 Adjusted R-squared 这两个值,就是我们常称为”拟合优度“和”修正的拟合优度“,是指回归方程对样本的拟合程度,这里我们可以看到,修正的拟合优度为 0.9429,表示拟合程度超过五成,这个值越高越好。\n",
"\n",
"最后,看下 F-statistic也就是常说的 F 统计量,也称为 F 检验,常用语判断方程整体的显著性实验,其 p 值为 9.505e-08显然小于 0.05,我们可以认为方程在 P=0.05 的水平上是通过显著性检验的。\n",
"\n",
"从上面我们看出我们的线性回归效果不错,那么我们可以利用拟合方程进行分类,或者预测。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"newX<-data.frame(x=0.16)\n",
"predict(model,newdata=newX,interval=\"prediction\",level=0.95)#interval=”prediction“ level指定预测的置信区间"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# R语言的逻辑回归示例"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"counts <- c(18,17,15,20,10,20,25,13,12)\n",
"outcome <- gl(3,1,9)\n",
"treatment <- gl(3,3)\n",
"print(d.AD <- data.frame(treatment, outcome, counts))\n",
"glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())\n",
"anova(glm.D93)\n",
"summary(glm.D93)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.2.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}