Python计算KS值并绘制KS曲线-创新互联
更多大数据分析、建模等内容请关注公众号《bigdatamodeling》
成都创新互联专注于企业成都全网营销、网站重做改版、德州网站定制设计、自适应品牌网站建设、html5、商城网站开发、集团公司官网建设、成都外贸网站建设公司、高端网站制作、响应式网页设计等建站业务,价格优惠性价比高,为德州等各大城市提供网站开发制作服务。python实现KS曲线,相关使用方法请参考上篇博客-R语言实现KS曲线
代码如下:
####################### PlotKS ##########################
def PlotKS(preds, labels, n, asc):
# preds is score: asc=1
# preds is prob: asc=0
pred = preds # 预测值
bad = labels # 取1为bad, 0为good
ksds = DataFrame({'bad': bad, 'pred': pred})
ksds['good'] = 1 - ksds.bad
if asc == 1:
ksds1 = ksds.sort_values(by=['pred', 'bad'], ascending=[True, True])
elif asc == 0:
ksds1 = ksds.sort_values(by=['pred', 'bad'], ascending=[False, True])
ksds1.index = range(len(ksds1.pred))
ksds1['cumsum_good1'] = 1.0*ksds1.good.cumsum()/sum(ksds1.good)
ksds1['cumsum_bad1'] = 1.0*ksds1.bad.cumsum()/sum(ksds1.bad)
if asc == 1:
ksds2 = ksds.sort_values(by=['pred', 'bad'], ascending=[True, False])
elif asc == 0:
ksds2 = ksds.sort_values(by=['pred', 'bad'], ascending=[False, False])
ksds2.index = range(len(ksds2.pred))
ksds2['cumsum_good2'] = 1.0*ksds2.good.cumsum()/sum(ksds2.good)
ksds2['cumsum_bad2'] = 1.0*ksds2.bad.cumsum()/sum(ksds2.bad)
# ksds1 ksds2 -> average
ksds = ksds1[['cumsum_good1', 'cumsum_bad1']]
ksds['cumsum_good2'] = ksds2['cumsum_good2']
ksds['cumsum_bad2'] = ksds2['cumsum_bad2']
ksds['cumsum_good'] = (ksds['cumsum_good1'] + ksds['cumsum_good2'])/2
ksds['cumsum_bad'] = (ksds['cumsum_bad1'] + ksds['cumsum_bad2'])/2
# ks
ksds['ks'] = ksds['cumsum_bad'] - ksds['cumsum_good']
ksds['tile0'] = range(1, len(ksds.ks) + 1)
ksds['tile'] = 1.0*ksds['tile0']/len(ksds['tile0'])
qe = list(np.arange(0, 1, 1.0/n))
qe.append(1)
qe = qe[1:]
ks_index = Series(ksds.index)
ks_index = ks_index.quantile(q = qe)
ks_index = np.ceil(ks_index).astype(int)
ks_index = list(ks_index)
ksds = ksds.loc[ks_index]
ksds = ksds[['tile', 'cumsum_good', 'cumsum_bad', 'ks']]
ksds0 = np.array([[0, 0, 0, 0]])
ksds = np.concatenate([ksds0, ksds], axis=0)
ksds = DataFrame(ksds, columns=['tile', 'cumsum_good', 'cumsum_bad', 'ks'])
ks_value = ksds.ks.max()
ks_pop = ksds.tile[ksds.ks.idxmax()]
print ('ks_value is ' + str(np.round(ks_value, 4)) + ' at pop = ' + str(np.round(ks_pop, 4)))
# chart
plt.plot(ksds.tile, ksds.cumsum_good, label='cum_good',
color='blue', linestyle='-', linewidth=2)
plt.plot(ksds.tile, ksds.cumsum_bad, label='cum_bad',
color='red', linestyle='-', linewidth=2)
plt.plot(ksds.tile, ksds.ks, label='ks',
color='green', linestyle='-', linewidth=2)
plt.axvline(ks_pop, color='gray', linestyle='--')
plt.axhline(ks_value, color='green', linestyle='--')
plt.axhline(ksds.loc[ksds.ks.idxmax(), 'cumsum_good'], color='blue', linestyle='--')
plt.axhline(ksds.loc[ksds.ks.idxmax(),'cumsum_bad'], color='red', linestyle='--')
plt.title('KS=%s ' %np.round(ks_value, 4) +
'at Pop=%s' %np.round(ks_pop, 4), fontsize=15)
return ksds
####################### over ##########################
作图效果如下:
另外有需要云服务器可以了解下创新互联scvps.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。
本文题目:Python计算KS值并绘制KS曲线-创新互联
文章网址:http://scyanting.com/article/cceopc.html