Archive for December 2010
There is plenty of enthusiasm for search engines like Google from researchers and the general public alike.
Google and Google Scholar are well-known for the wide breadth of the information they search. Google brings in news, factual and opinion-related information, and Google Scholar also emphasises scientific content across many disciplines.
But do these search tools give as comprehensive a picture of a particular research field as a specialist database?
This is the question that the team behind the specialised scientific database on energy-related information, ETDEWEB (the Energy Technology Data Exchange – World Energy Base) set out to answer by studying user search results.
The ETDE team compared the results of 15 energy-related queries performed on all three systems – ETDEWEB, Google and Google Scholar – using identical words/phrases.
More than 40,000 search result records from the three sources were evaluated. The study concluded that ETDEWEB is a significant resource to energy experts for discovering relevant energy information. In the 15 searches, nearly 90 per cent of the results in ETDEWEB were not shown by Google or Google Scholar.
Google is certainly a highly-used and valuable tool to find significant ‘non-specialist’ information, and Google Scholar does focus on scientific disciplines.
If a user’s interest is scientific and energy-specific, ETDEWEB continues to hold a strong position in the energy research, technology and development (RTD) information field and adds considerable value in knowledge discovery
Cutler, Debbie (ETDE). Database versus search engine. Research Information, Dec. 2010 / Jan. 2011. online:
Katherine Allen kindly reported my Online’s presentation about Science 2.0 in InfoToday…
Allen, Katherine. “Science 2.0″: is there a role for InfoPros?. InfoToday, online, posted on 16th of December 2010.
Open access has become very popular over the last few years. It is evident in the increasing number of scientific journals being made available free to readers on the Internet, and the increasing number of institutions that are building repositories to house the electronic versions of open-access articles written by scholars at their institutions.
The academic and research communities seem to support this movement and their right to obtain easy and free access to publicly funded scientific information.
But, how often do researchers actually use such free publications as readers and how often do they choose to publish in an OA journal or institutional repository?
How trustworthy do they consider those journals and repositories? Would they prefer that OA repositories be more selective?
Although today about 10-15 percent of scientific peer-reviewed journals are OA and there are several declarations encouraging institutions to build OA repositories, there is still a long way to go, especially where OA repositories are concerned.
This research is trying to determine why acceptance and growth of open access, particularly open access repositories, has been so slow.
- OA repositories do not follow any standard procedures for selecting articles to include
- The vast majority of the participants in the survey state that they would be open to contributing to OA repositories that followed the selection procedures used in high-reputation subscription-based journals
- the majority of the participants seem to be well disposed towards acting as severe and strict reviewers for an OA repository
While we would expect that the scientific community would be accustomed to the use of open access publications, scientists and researchers seem to still be a little cautious. However, this research shows that they welcome changes that might lead to more credible publications, even if that means that their own work will undergo scrutinizing reviews.
OA repositories are certainly far more established now than in the last few decades. But still, in order to win over the scientific community as a whole, we have to take some steps to ensure the quality of published information.
Roxana Theodorou. OA Repositories: the Researchers’ Point of View
Journal of Electronic Publishing,Volume 13, Issue 3, December 2010.
This paper introduces two journal metrics recently endorsed by Elsevier’s Scopus: SCImago Journal Rank (SJR) and Source Normalized Impact per Paper (SNIP). SJR weights citations according to the status of the citing journal and aims to measure journal prestige rather than popularity.
It presents the main features of the two indicators, comparing them one with another, and with a journal impact measure similar to Thomson Reuters’ journal impact factor (JIF).
The journal impact factor, developed by Eugene Garfield as a tool to monitor the adequacy of coverage of the Science Citation Index, is probably the most widely used bibliometric indicator in the scientific, scholarly and publishing community. However, its extensive use for purposes for which it was not designed has raised a series of criticisms, all aiming to adapt the measure to the new user needs
In January 2010, Scopus endorsed two such measures that had been developed by their partners and bibliometric experts SCImago Research Group, based in Spain (…), and the Centre for Science and technology Studies (CWTS), based in Leiden, Netherlands, (…). The two metrics that were endorsed are SCImago Journal Rank (SJR) and Source Normalized Impact per Paper (SNIP).
Compared to other main fields, life sciences and health sciences tend to reveal the highest SJR and RIP values. Compared to the basic, JIF-like RIP (raw impact per paper), SJR tends to make the differences between journals larger, and enhances the position of the most prestigious journals, especially – though not exclusively – in life and health sciences.
The fact that Scopus introduced these two complementary measures reflects the notion that journal performance is a multi-dimensional concept, and that there is no single ‘perfect’ indicator of journal performance.
Lisa Colledge, Félix de Moya‐Anegón, Vicente Guerrero‐Bote, et al. SJR and SNIP: two new journal metrics in Elsevier’s Scopus. Serials: The Journal for the Serials Community.Volume 23, Number 3 / November 2010. Pages: 215 – 221
When celebrating the 20 millions of articles in last july (see my previous post), I missed this excellent review of the famous platform.
- A central index freely available globally: Many biomedical scientists probably take PubMed for granted, but try to imagine biology and medicine without it – we would struggle to find anything.
- Twenty million citations: That’s a lot of data and it’s growing at a rate of about one paper per minute (on average).
- More than a billion searches in 2009: That’s an average of 3.5 million searches per day or 40 searches per second …
- PubMed is too big and full of noise: Theodore Sturgeon’s law states that 90% of everything is rubbish. If correct, this means around 18 million records in PubMed are worthless junk. But that won’t stop them cluttering up the database and your search results making it harder to find what you want when you need it. Many of the papers indexed by PubMed are “salami-sliced” by publication-hungry scientists into the least publishable unit and are of little or no actual scientific value. It can be difficult (or impossible) to find what you need in PubMed. Cameron Neylon calls this discovery deficit, but however you describe it, finding the information you need in PubMed can be frustratingly difficult – despite the redesigns. There is so much in PubMed it is impossible to keep up.
- PubMed is too small: Some people argue that an overly conservative indexing and editorial policy prevents PubMed from including lots of biomedically relevant literature that is published in physics, chemistry, mathematics, engineering and computer science journals. Currently much of this data is excluded from the database. Actually, what we really need is PubSCIENCE (covering non-medical sciences) but that idea got tragically axed back in 2002.
- Identity crisis, ambiguous authors:
- Identity crisis, missing document identifiers: There are over forty million unique document ID’s in the form of DOI’s. They are a useful way to uniquely identify papers on the Web and link directly to their full content wherever they were originally published. But you might have trouble using DOIs in PubMed. Sometimes DOI’s get left out of records (see some random examples here) altogether. When they are included, they can get buried and are not very accessible. For example this record has a DOI but you won’t find it anywhere in the default page served by PubMed, which means you can’t easily click through to the full text of the article which the DOI would take you to. What this means is, PubMed is not as well integrated with other databases as it could and should be.
- Mostly abstracts only: PubMed has 20 million freely available abstracts rather than 20 million full text papers. Imagine how the rate of scientific discovery and invention might increase (and the cost might decrease) if it was PubMed Central that had 20 million citations instead of just PubMed. Alas, PubMed Central is currently closer to the 2 million mark than the 20 million mark, but it is growing rapidly thanks to deposition mandates and open access publishing.
- Ranking results: by default PubMed ranks search results by date – but if Google did the same, very few people would bother use it. Ranking results by relevance, by using an algorithm more like PageRank, would be much more useful to many users as demonstrated by Pierre Lindenbaum.
- Text mining and ontologies: We’ve still a long way to go before fully exploiting the possibilities offered by text-mining and ontologies to allow PubMed users to semantically search and browse the data. MeSH is just the beginning but that’s another story…
PubMed is a substantial fourteen years of work which continues to have significant benefits for many scientists around the world. There is plenty of room for improvement, but it’s hard to imagine Life® without PubMed®.
Duncan Hull. Twenty million papers in PubMed: a triumph or a tragedy?. O’Really, Online, posted on July 27, 2010:
A new source that is worth to be tested…
Free full Text , a beta paltform by Knowmade, is based on Google Custom technology, with a search engine indexing over 10 million of free PDFs on science fields.
“Our search engine currently indexes the full text scientific articles from more than 120 databases, web editor, open archives and so on. … This figure is constantly changing because we are always expanding the site” told me Brice Sagot, the founder of the French small business.
At the souce of the project was a frustration regarding the poor indexing of Google Scholar:
“Google Scholar is not comprehensive and we have identified certain types of publications are not indexed. For example, in some scientific journals (eg PNAS), in addition to the article, authors can publish “Supporting Information” to specify the materials and methods, new figures show etc.. …. And Google Scholar does not index information that can be very interesting”
Freely accessible at:
Let’s keep an eye on this promising initiative…