kmp函数python Kmp算法

Python 和 C++ 下字符串查找速度对比,你觉得Python适合算法竞赛吗

一位同学最近在备战一场算法竞赛,语言误选了 Python ,无奈只能着手对常见场景进行语言迁移。而字符串查找的场景在算法竞赛中时有出现。本文即对此场景在 Python 和竞赛常用语言 C++ 下的速度进行对比,并提供相关参数和运行结果供他人参考。

创新互联主营阳朔网站建设的网络公司,主营网站建设方案,app软件开发公司,阳朔h5小程序制作搭建,阳朔网站营销推广欢迎阳朔等地区企业咨询

本次实测设置两个场景:场景 1 的源串字符分布使用伪随机数生成器生成,表示字符串查找的平均情况;场景 2 的源串可连续分割成 20,000 个长度为 50 的字符片段,其中第 15,001 个即为模式串,形如“ab…b”(1 个“a”,49 个 “b”),其余的字符片段形如“ab…c”(1 个“a”,48 个“b”,1 个“c”)。

本次实测中,Python 语言使用内置类型 str 的 .find() 成员函数,C++ 语言分别使用 string 类的 .find() 成员函数、 strstr 标准库函数和用户实现的 KMP 算法。

IPython 的 %timeit 魔法命令可以输出代码多次执行的平均时间和标准差,在此取平均时间。C++ 的代码对每个模式串固定运行 1,000 次后取平均时间。

以下时间若无特别说明,均以微秒为单位,保留到整数位。

* 原输出为“2.63 ms”。IPython 的 %timeit 输出的均值保留 3 位有效数字,由于此时间已超过 1 毫秒,微秒位被舍弃。此处仍以微秒作单位,数值记为“2630”。

本次实测时使用的设备硬件上劣于算法竞赛中的标准配置机器,实测结果中的“绝对数值”参考性较低。

根据上表中的结果,在给定环境和相关参数条件下,场景 1 中 Python 的运行时间大约为 C++ 中 string::find 的五分之一,与 std:strstr 接近;而在场景 2 中 Python 的运行时间明显增长,但 C++ 的前两种测试方法的运行时间与先前接近甚至更短。四次测试中,C++ 的用户实现的 KMP 算法运行时间均较长,长于同条件下 Python 的情况。

Python 中的内置类型 str 的快速查找( .find() )和计数( .count() )算法基于 Boyer-Moore 算法 和 Horspool 算法 的混合,其中后者是前者的简化,而前者与 Knuth-Morris-Pratt 算法 有关。

有关 C++ 的 string::find 比 std::strstr 运行时间长的相关情况,参见 Bug 66414 - string::find ten times slower than strstr 。

Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.

KMP 算法并非是所有线性复杂度算法中最快的。在不同的环境(软硬件、测试数据等)下,KMP 与其变种乃至其他线性复杂度算法,孰优孰劣都无法判断。编译器在设计时考虑到诸多可能的因素,尽可能使不同环境下都能有相对较优的策略来得到结果。因而,在保证结果正确的情况下,与其根据算法原理自行编写,不如直接使用标准库中提供的函数。

同时本次实测也在运行时间角度再次印证 Python 并不适合在算法竞赛中取得高成绩的说法,你们觉得呢?平仑区留下你的看法。

python 报错

报错

第二段的意思是说:提示这意味着已将OpenMP运行时的多个副本链接到程序中。 这很危险,因为它会降低性能或导致错误的结果。 最好的办法是确保仅将单个OpenMP运行时链接到该流程中,例如 通过避免在任何库中静态链接OpenMP运行时。 作为不安全,不受支持,未记录的解决方法,您可以设置环境变量KMP_DUPLICATE_LIB_OK = TRUE以允许程序继续执行,但可能导致崩溃或无提示地产生错误的结果。

它提示设置环境变量,因此可以加入一下语句,然后即可成功运行:

我的KMP算法做出来了,可是居然运行时间比普通匹配还慢??求高手解答,帮忙修改一下!

①你的KMP代码里很多东西导致KMP算法变慢:

1、计算主字符串长度很浪费时间,可以不用计算的,事先开辟一个大空间。

2、new int[len+1];很浪费时间,事先开辟大空间,或者用多级内存管理。

3、delete []next很浪费时间。

②但是①说的不是本质,真正的原因是:KMP的优点是避免重复匹配的记忆功能,缺点是启动慢构造next,这就导致如果要被匹配的主字符串太短(少于1k个字符都算短的)。

而朴素算法启动不需要时间。

③另一方面,你的main中主字符串和匹配字符串没有相似性(只在最后2个字符串才进行了一次大的返回),而KMP的优点就在于处理相似字符串匹配,相似度越高,字符串越长,匹配效果越好。

④我改了下你的代码,增加的主字符串的长度,提高了字符串相似程度,结果就是KMP的时间比朴素算法要好接近30%:

Time to do Index loops is 32.031

Time to do KMP loops is 23.609

⑤代码如下:

#includetime.h

#include iostream.h

#include string.h

int Index_BF ( char S [ ], char T [ ], int pos )

{

int i = pos, j = 0;

while ( S[i+j] != '\0' T[j] != '\0')

if ( S[i+j] == T[j] )

j ++;

else

{

i ++; j = 0; }

if ( T[j] == '\0')

return i;

else

return -1;

}

void get_nextval(const char *T, int next[])

{

int j = 0, k = -1;

next[0] = -1;

while ( T[j/*+1*/] != '\0' )

{

if (k == -1 || T[j] == T[k])

{

++j; ++k;

if (T[j]!=T[k])

next[j] = k;

else

next[j] = next[k];

}// if

else

k = next[k];

}

}

int KMP(const char *Text,const char* Pattern) //const 表示函数内部不会改变这个参数的值。

{

if( !Text||!Pattern|| Pattern[0]=='\0' || Text[0]=='\0' )//

return -1;//空指针或空串,返回-1。

static int next[50];

get_nextval(Pattern,next);//求Pattern的next函数值

static int index,i,j;

index=i=j=0;

for(;Text[i]!='\0' Pattern[j]!='\0';)

{

if(Text[i]== Pattern[j])

{

++i;// 继续比较后继字符

++j;

}

else

{

index += j-next[j];

if(next[j]!=-1)

j=next[j];// 模式串向右移动

else

{

j=0;

++i;

}

}

}

//delete []next;

if(Pattern[j]=='\0')

return index;// 匹配成功

else

return -1;

}

int main()//abCabCad

{

int i;

char* text= "jegiradCadCegjirjgirjgirjfewijfejifgjadCadCadCadCadCadCadCadCadCadCadCadCadCadCerigjirjgirejgirejigjreigreigjreigjirgjeigjirjgigijirjeihjirhjirjhirjgijerigjir\

fewfjiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergergergregregregergregregergr\

jegiradCadCegjirjgirjgirjfewijfejifgjadCadCadCadCadCadCadCadCadCadCadCadCadCadCerigjirjgirejgirejigjreigreigjreigjirgjeigjirjgigijirjeihjirhjirjhirjgijerigjir\

fewfjiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergergergregregregergregregergr\

jegiradCadCegjirjgirjgirjfewijfejifgjadCadCadCadCadCadCadCadCadCadCadCadCadCadCerigjirjgirejgirejigjreigreigjreigjirgjeigjirjgigijirjeihjirhjirjhirjgijerigjir\

fewfjiadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttcc\

adCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttcc\

adCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttccadCadCadtttcc\

fewieiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeffefiwejfijiwfjiewjfijeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiegirjgirjgirjigjierjgijrighrjigjeigjadCadCadjreigjrijgirejgijreigjerjigejrbababCabCadcadewfewgfgergregegrgergeiwjjgiejijriegjirigjadCadCadtttccc\

adCadCadtttccctctc";

char*pattern="adCadCadtttccctctc";

clock_t start, finish;

double duration;

//普通匹配算法

cout( "Time to do Index loops is ") ;

start = clock();

for(int k=0;k50000;k++)

{

i=Index_BF(text,pattern,1);

}

finish = clock();

duration = (double)(finish - start) / CLOCKS_PER_SEC;

cout( "%f seconds\n", duration )endl;

//KMP匹配算法

cout( "Time to do KMP loops is ")endl;

start = clock();

for(int j=0;j50000;j++)

{

i=KMP(text,pattern);

}

finish = clock();

duration = (double)(finish - start) / CLOCKS_PER_SEC;

cout( "%f seconds\n", duration )endl;

cinfinish;

return 0;

}

python更改文本内容

用kmp算法预处理list中的字符串

处理文本的第1列 组成1个大字符串

用kmp处理第1列组成的字符串 写新文本

我的思路 你可以试一下 文本很大的话 估计需要不少优化


当前标题:kmp函数python Kmp算法
网页链接:http://scyanting.com/article/dodjioc.html