python数据分析与挖掘实战的一些bug

总结学习＜python数据分析与挖掘实战＞碰到的一些问题，这些问题主要是书中的不足之处或程序的bug.

pandas读取excel文件

书上P36页读取EXCEL文件是这样的(第一次出现在P27)

data = pd.read_excel(catering_sale, index_col = u'日期')
这会报错：
TypeError: read_excel() takes exactly 2 arguments (1 given)
于是查看read_excel函数的定义：
def read_excel(io, sheetname, **kwds):
需要两个参数，而io, sheetname是这样定义的

io : string, file-like object or xlrd workbook
    If a string, expected to be a path to xls or xlsx file
sheetname : string
     Name of Excel sheet

于是，我这样写：
data = pd.read_excel(catering_sale, sheetname='catering_sale', index_col = u'日期')
报错找不到表名,后来在stack overflow上才知道这是pandas　api变动，要令sheetname=0,于是改成这样：
data = pd.read_excel(catering_sale, sheetname=0, index_col = u'日期')
通过，书中的这个bug,浪费了我好多时间．

matplotlib画图中文字体

书中中文字体是这样说的P26
plt.rcParams['font.sans-serif'] = ['SimHei']
这个在Windows上有没有用不知道，但在本人的ubuntu上，是没有效果的,还报错找不到sans-serif字体．对于这个问题，找遍goole终于成功解决．参考文章Ubuntu下matplotlib的中文显示

修改/etc/matplotlib

1 2	font.family: sans-serif font.sans-serif: WenQuanYi Micro Hei, Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif

添加wqy字体，前提是先安装wqy字体．

清空字体缓存

若不清空，不会有效果．
rm ~/.cache/matplotlib/

字体文件后缀

由于matplotlib查找系统字体的时候不匹配ttc字体文件，所以：

1 2	cd /usr/share/fonts/truetype/wqy cp wqymicrohei.ttc wqymicrohei.ttf

重新运行程序，中文字体显示成功.

pandas plot函数kind参数

P55　D.plot(kind=’box’)错误，kind参数不能指定为box函数，应换为boxplot()函数．
查看帮助文档可以指定的参数：

kind : {'line', 'bar', 'barh', 'kde', 'density', 'scatter'}
    bar : vertical bar plot
    barh : horizontal bar plot
    kde/density : Kernel Density Estimation plot
    scatter: scatter plot