eTBLAST

eTBLAST is a now-defunct free text similarity service search engine currently offering access to the MEDLINE database, the National Institutes of Health (NIH) CRISP database, the Institute of Physics (IOP) database, Wikipedia, arXiv, the NASA technical reports database, Virginia Tech class descriptions and a variety of databases of clinical interest. It is continuously expanding with additional text-based databases. eTBLAST searches citation databases[1][2] and databases containing full text,[3] such as PUBMED. The eTBLAST server compares a user's natural text query to target databases using a hybrid search algorithm consisting of a low-sensitivity weighted keyword-based first pass followed by a novel sentence-alignment based second pass. eTBLAST is a free web-based service of The Innovation Laboratory at the Virginia Bioinformatics Institute.

eTBLAST, as a text similarity engine, made possible a large study of duplicate publications and potential plagiarisms in the biomedical literature. Thousands of random samples of Medline abstracts were submitted to eTBLAST, and those with the highest similarity were studied and entered into an on-line database. This study is on-going, with the database maturing as the entries are manually inspected and classified. This work revealed several trends, including an increasing rate of duplication in the biomedical literature, as reported in the journals Bioinformatics,[4][5] Anaesthesia and Intensive Care,[6] Clinical Chemistry,[7] Urologic Oncology,[8] Nature,[9] and Science.[10]

Interface

Because eTBLAST is a text-similarity engine rather than a simple keyword-based search tool, it is claimed that the user need not identify and manipulate query keywords and Boolean operators, as must be done for other search engines.

eTBLAST aims to help the user rapidly to find references, evaluate novelty, find experts and journals in a given topical area[11] and track the popularity of the topic as defined by the user’s query. There also is information found within the results as a set, in addition to those found within individual 'hits'. eTBLAST can also infer possible hypothese from inspection of implicit keywords found within the top most similar 'hits'. A matrix of similarity and a heat map are also displayed for the most similar 'hits'.

A typical query of 120 words takes less than 10 seconds to return results after a comparison to MEDLINE that as of 8/1/2011 contains over 20 million records.

See also

References

  1. Lewis, J; Ossowski, S; Hicks, J; Errami, M; Garner, HR (2006). "Text similarity: An alternative way to search MEDLINE". Bioinformatics. 22 (18): 2298–304. doi:10.1093/bioinformatics/btl388. PMID 16926219.
  2. Pertsemlidis, A; Garner, HR (2004). "Text comparison based on dynamic programming". IEEE Engineering in Medicine and Biology Magazine. 23 (6): 66–71. doi:10.1109/MEMB.2004.1378640. PMID 15688594.
  3. Sun, Z; Errami, M; Long, T; Renard, C; Choradia, N; Garner, H (2010). Curioso, Walter H, ed. "Systematic Characterizations of Text Similarity in Full Text Biomedical Publications". PLoS ONE. 5 (9): e12704. doi:10.1371/journal.pone.0012704. PMC 2939881Freely accessible. PMID 20856807.
  4. Errami, M; Hicks, JM; Fisher, W; Trusty, D; Wren, JD; Long, TC; Garner, HR (2007). "Deja vu a study of duplicate citations in Medline". Bioinformatics. 24 (2): 243–9. doi:10.1093/bioinformatics/btm574. PMID 18056062.
  5. Errami, M; Sun, Z; George, AC; Long, TC; Skinner, MA; Wren, JD; Garner, HR (2010). "Identifying duplicate content using statistically improbable phrases". Bioinformatics. 26 (11): 1453–7. doi:10.1093/bioinformatics/btq146. PMC 2872002Freely accessible. PMID 20472545.
  6. Loadsman, JA; Garner, HR; Drummond, GB (2008). "Towards the elimination of duplication in Anaesthesia and Intensive Care". Anaesthesia and Intensive Care. 36 (5): 643–5. PMID 18853580.
  7. George, AC; Long, TC; Garner, HR (2010). "Quaere Verum". Clinical Chemistry. 56 (4): 673–4. doi:10.1373/clinchem.2009.130468. PMID 20093558.
  8. Garner, HR (2011). "Combating unethical publications with plagiarism detection services". Urologic Oncology. 29: 95–9. doi:10.1016/j.urolonc.2010.09.016. PMC 3035174Freely accessible. PMID 21194644.
  9. Errami, M; Garner, H (2008). "A tale of two citations". Nature. 451 (7177): 397–9. doi:10.1038/451397a. PMID 18216832.
  10. Long, TC; Errami, M; George, AC; Sun, Z; Garner, HR (2009). "Responding to Possible Plagiarism". Science. 323 (5919): 1293–4. doi:10.1126/science.1167408. PMID 19265004.
  11. Errami, M; Wren, JD; Hicks, JM; Garner, HR (2007). "ETBLAST: A web server to identify expert reviewers, appropriate journals and similar publications". Nucleic Acids Research. 35 (Web Server issue): W12–5. doi:10.1093/nar/gkm221. PMC 1933238Freely accessible. PMID 17452348.
This article is issued from Wikipedia - version of the 11/12/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.