Complete News World

The largest electronic catalog of scientific life has been completed

The largest electronic catalog of scientific life has been completed

The database can be searched in the public index based on 355 billion words, phrases and sentence excerpts from 107 million scientific articles. Publishers can still shred it.

Free Data Advertiser Launch Carl Malamud (Pictured above) Another big throw: The world’s largest free online catalog of science. General Index was established by Malamud general Listed as a non-profit organization whose largest undertaking is to publish US legal resources. same index The Internet Archive saves space.

107 million articles, 355 billion items

The dataset, which includes more than 355 billion words and sentence extracts from indexed article texts, as well as the data tables needed to identify articles, will be available from October 7. Malamud had the support of a number of eminent scholars at the head of his project Fenton J. Serville.

a sample Read about nature, the initiative is of great importance in the scientific world because researchers can create an image of a particular scientific publication even if they do not have access to its source (for example, they do not have a subscription to a particular journal, archive, etc.).

The practical importance of Malamud’s initiative was highlighted by a computer biologist at the University of Cambridge. Gitanjali Yadav He talked about giving him tremendous help in locating our appearance on the subject he was dealing with. (Yadav deals with VOCs emitted by plants and, as he said, has a great deal of information needed for his research in various publications, and with the Malamud Index he can now collect these.)

Partial Do-It-Yourself Project

The question may rightly be asked: how different is a public index from Google Scholar, which indexes literary texts paid with the consent of publishers. Malamud’s answer to that is that there, users can only access certain types of text queries, and the service also limits automated searches. For this reason, it is not suitable for performing computer analyzes that require more advanced searches.

The general index itself arose out of a project that would have allowed texts to be mined in scholarly publications without scholars having access to the text itself. The service, which launched earlier this month, is even simpler: you don’t have your own web search page, for example. If someone wants to use it, they have to create their own analysis/research software for the downloaded data. At the same time, Malamud hopes that users of the index will create open source search engines that will be shared with the scientific community.

This partial do-it-yourself solution isn’t that simple, considering that the index would require roughly 5TB of compression and 38TB of extraction. Part of the collection are spreadsheets containing nearly 20 billion keywords in processed articles, as well as article titles, authors, and digital object identifiers (DOIs).

Important question: is it legal?

According to Malamud, the Public Index does not infringe copyright, as it contains excerpts of sentences up to five words long from the articles. At the same time, of course, there is absolutely no guarantee that publishers will like this model as well, which means they can attack the practice, a legal expert told Nature.

Researcher lawyer at the University of Washington. Michael Carroll For example, he sees no impediment to global distribution of a public index, although he also cautions that copyright regulations may vary from country to country. According to Carroll, the question is whether Malamud violated the publishers’ terms by copying and manipulating the articles on which the index is based. (By the way, Malamud also admitted that in order to create the index, he had to obtain a copy of the 107 million articles processed. However, he did not reveal how he obtained them.)

Nature asked six publishers what they thought: but its own publisher, Springer Nature, was only willing to comment on the public index. They also only said they support open research initiatives, but legitimacy is important.

See also  Mercedes spoke of shutting down the plant in Kisquemet


Your email address will not be published. Required fields are marked *

"Friendly thinker. Wannabe social media geek. Extreme student. Total troublemaker. Web evangelist. Tv advocate."