Unintentional searching & deciding which new preprints to read tends to bias towards the preprints with already high visibility.
It's difficult to encode factors that might predict low visibility of a paper.
Such factors may include geography or Twitter reach and put many first-time preprint authors at a disadvantage.
We're using ACTUAL early visibility (abstract views in first 3 months) of a preprint in our Shadow Index calculations.
Shadow Index acts as our boost factor. It's a scale from 0 to 100.
High Shadow Index indicates low visibility of the preprint. That is a signal to Hidden Preprints that this paper should be promoted.
This prototype uses data from "Complete Rxivist dataset of scraped bioRxiv data" by Richard J. Abdill and Ran Blekhman.
All code is available on GitHub in Hidden Preprints organisation.
docs: Documentation for the entire project.
web-api: API which supplies the data for this application. (Back-end layer)
web-ui: Code for this application. (Front-end layer)
crossref-citation-networks: Experiments with CrossRef citation network data.
rxivist-data-exploration: Experiments with the dataset to understand the patterns behind preprint views.