APPROACHES TO ASSESSING THE SEMANTIC SIMILARITY AND FUTURE CITATION OF PUBLICATIONS BY IDENTIFYING INFORMATIVE TERMS WITH PREDICTIVE PROPERTIES
Abstract and keywords
Abstract (English):
The article discusses new approaches to assessing the semantic similarity of documents in a vector space, taking into account statistically significant and informative terms. Informative terms reflect the current state of research in a certain field of research. To select informative terms, an algorithm for calculating the impact factor of the term is proposed. It is shown that informative terms allow both to evaluate the semantic similarity of texts and to predict future citations. The developed methods for assessing the semantic similarity and future impact of scientific publications can be used in the framework of “Predictive optimization”, a modern technology that allows us to make decisions based on forecasts. In evaluating the activities of research and individual scientists, bibliometric indicators often play an important role. However, the use of citation-based indicators is problematic in determining the impact of recent publications. Usually, two years after the publication of most articles, they receive only a few links. The probability of future citation can be predicted using the proposed indicator - IFT.

Keywords:
semantic similarity, informative terms, impact factor of the term, citations, statistical analysis, citation prediction
Text
Publication text (PDF): Read Download
References

1. Gipp, B. (2014). Citation-based Document Similarity. Citation-based Plagiarism Detection. Springer Fachmedien Wiesbaden, pp. 43-55.

2. Gomaa, W.H.and Fahmy, A.A. (2013). A survey of text similarity approaches, Int. J. Comput. Appl., vol. 68, no. 13, doi: https://doi.org/10.5120/11638-7118.

3. Leydesdor, L. (1989). Words and co-words as indicators of intellectual organization. Research Policy 18(4), pp. 209-223. DOI http://dx.doi.org/10.1016/0048-7333(89)90016-4. URL http://www.sciencedirect.com/science/article/pii/0048733389900164

4. Charnine, M., Klimenko, S. (2015). Measuring of “Idea-based” Influence of Scientific Papers // Proceedings of the 2015 International Conference on Information Science and Security (ICISS 2015), December 14-16, Seoul, South Korea, pp.160-164.

5. Landauer, T.K. & Dumais, S.T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge", Psychological Review, 104.

6. Matveeva, I., Levow, G., Farahat, A. & Royer, C. (2005). Generalized latent semantic analysis for term representation. In Proc. of RANLP.

7. Zaidi I, Singh S, Sinha A, Dwivedi R. (2015). Current views and implications of journal impact factor: A key note. Indian J Dent. 6(2):113-114. doi:10.4103/0975-962X.154375

8. Pan, R., Fortunato, S. (2015). Author Impact Factor: tracking the dynamics of individual scientific impact. Sci Rep 4, 4880. https://doi.org/10.1038/srep04880.

9. Walters, G. (2006). Predicting subsequent citations to articles published in twelve crime-psychology journals: Author impact versus journal impact.Scientometrics, 69(3), pp. 499-510.

10. Haslam, N., Ban, L., Kaufmann, L., Loughnan, S., Peters, K., Whelan, J., et al. (2008). What makes an article influential? Predicting impact in social andpersonality psychology. Scientometrics, 76(1), pp.169-185.

11. Fu, L., & Aliferis, C. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature.Scientometrics, 85(1), pp. 257-270.

12. Wang, M., Yu, G., & Yu, D. (2011). Mining typical features for highly cited papers. Scientometrics, 87(3), pp. 695-706.

13. Wang, M., Yu, G., Xu, J., He, H., Yu, D., & An, S. (2012). Development a case-based classifier for predicting highly cited papers. Journal of Informetrics, 6(4), pp.586-599.

14. Didegah, F., & Thelwall, M. (2013a]). Determinants of research citation impact in nanoscience and nanotechnology. Journal of the American Society forInformation Science and Technology, 64(5), pp.1055-1064.

15. Yu, T., Yu, G., Li, P.-Y. & Wang, L. (2014). Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics, 101(2), pp.1233-1252.

16. Onodera, N. & Yoshikane, F. (2015). Factors affecting citation rates of research articles. Journal of the Association for Information Science and Technology,66(4), 739-764.

17. Cao, X., Chen, Y., Liu K.J.R. (2016). A data analytic approach to quantifying scientific impact. Journal of Informetrics, 10 (2), pp. 471-484.

18. Golosovsky, M., Solomon S. (2017). Growing complex network of citations of scientific papers: Modeling and measurements. Physical Review E, 95 (1), p. 012324.

19. Fiala, D., Tutoky G. (2018). PageRank-based prediction of award-winning researchers and the impact of citations. Journal of Informetrics, 11 (4), pp. 1044-1068.

20. Wang, D., Song, C., Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342 (6154) , pp. 127-132.

21. Bornmann, L., Leydesdorff, L., & Wang, J. (2013). Which percentile-based approach should be preferred for calculating normalized citation impact values?an empirical comparison of five approaches including a newly developed citation-rank approach (p100). Journal of Informetrics, 7(4), pp.933-944.

22. Bai, X., Zhang, F., Lee, I. (2019). Predicting the citations of scholarly paper. Journal of Informetrics, Volume 13, Issue 1, pp. 407-418.

Login or Create
* Forgot password?