1.读入待分析的字符串
str='''We don't talk anymoreWe don't talk anymoreWe don't talk anymoreLike we used to do We don't laugh anymoreWhat was all of it for? We don't talk anymore Like we used to doI just heard you found the one you've been lookin'The one you been looking forI wish i would've konwn that wasn't me Cause even after all this time i still wonderWhy i can't move on? Just the way you dance so easliy Don't wanna know The kinda dress you're wearin' tonightIf he's holdin' onto you so tightThe way i did beforeI overdosed Should've known your love was gameNow I can't get'cha out of my brainOoh it's such a shameWe don't talk anymoreWe don't talk anymore We don't talk anymoreLike we used to do We don't laugh anymoreWhat was all of it for?We don't talk anymore Like we used to doI just hope you'r lyin' next to somebodyKnow it's hard to love ya like meMust be a good reason that you're goneEvery now and thenI think you might want me to come show up your doorBut I'm just too afraid that i'll be worngDon't wanna know If you'ra lookin' into her eyesIf she's holdin' onto you so tightThe way i did beforeI overdosedShould've know your love was a game Now I can't get'cha out of my brainOoh it's such a shameWe don't talk anymoreWe don't talk anymoreWe don't talk anymoreLike we used to do We don't laugh anymoreWhat was all of it for? We don't talk anymoreLike we used to doLike we used to doDon't wanna knowThe kinda dress you're wearin' tonightIf he's givin' it to you just rightThe way i did beforeI overdosedShould've know your love was a game Now I can't get'cha out of my brainOoh it's such a shameWe don't talk anymoreWe don't talk anymoreWe don't talk anymoreLike we used to do We don't laugh anymoreWhat was all of it for? We don't talk anymoreLike we used to doWe don't talk anymoreThe way did beforeWe don't talk anymoreOoh WooOoh it's such a shameWe don't talk anymore'''
2.分解提取单词
3.计数字典
4.排除语法型词汇
5.排序
6.输出TOP(20)
fo=open('1.txt','r')str=fo.read()str=str.lower() #转换为小写for i in ',.?': str=str.replace(i,' ') #用空格代替标点符号 words=str.split(' ') #分解提取单词exc={ 'to','a','of','it',} #选择高频且无效的关键词dic={} keys=set(words) #出现过的单词的集合keys=keys-excprint(words)#排除语法型词汇for i in keys: dic[i]=words.count(i) #计数字典print(dic)wc=list(dic.items()) #列表wc.sort(key=lambda x:x[1],reverse=True)#排序print(wc)for i in range(20): #输出TOP(20) print(wc[i])
运行结果: