基于预聚类的潜在语义分析模型文献检索研究
A new pre-clustering-based latent semantic analysis algorithm for document retrieval
云南民族大学学报:自然科学版,2015,24(3):257-260

和晓萍 HXP

摘要


提出一种基于预聚类的潜在语义文献检索算法.首先,对待检索文档集进行预聚类,在潜在语义分析方法的基础上采用k-means聚类算法,寻找出各聚类簇的中心点;其次,在检索时,通过计算查询向量与各聚类簇中心点的相似度来进行检索.此方法有效解决了现有潜在语义文献检索算法在检索时需耗费大量时间计算查询向量与各文本向量之间的相似度的不足.另外还针对文献检索的特点,重新给出特征权重计算方法.实验结果表明,该方法缩短了检索的时间,提高了检索的效率. This paper proposes a pre-clustering-based latent semantic analysis algorithm for document retrieval. It first clusters the documents using k-means clustering based on the latent semantic analysis, finds out the central point of each cluster, and then calculates the similarity between the query vector and each clusters central points for retrieval. The algorithm can solve the problem of time-consuming computation of the similarity between the query vector and each text vector in the traditional latent semantic algorithm for document retrieval. In view of the characteristics of document retrieval, it proposes a new method for calculating the feature weights. The results of the experiment show that the new algorithm can reduce the search time, and improve the retrieval efficiency.

参考



全文: PDF      下载: 1060      浏览: 433


counter for myspace
云南民族大学学报(自然科学版) 1991—2016 Copyright
地址:云南省昆明市一二.一大街134号 邮编:650031 全国邮发代号:64-47
电话:0871-65132114 传真:0871-65137493 Email:ynmzxyxb@163.com