This README.txt file was generated on by ------------------- GENERAL INFORMATION ------------------- Title of Dataset: The MDPI dataset: a link analysis Author Information: Author #1: Orduña-Malea, E., Universitat Politècnica de València, Camino de Vera s/n, 46022, Valencia (Spain),, Author #2: Aguillo, Isidro F., Consejo Superior de Investigaciones Científicas - Instituto de Bienes y Políticas Públicas, Albasanz26, 28037 Madrid (Spain), isidro.aguillo@cchs.csic.es Date of data collection: 2021-07-14> Geographic location of data collection: Valencia (Spain). 39.48512,-0.34134 Information about funding sources or sponsorship that supported the collection of the data: Generalitat Valenciana, Posicionamiento académico web de las universidades españolas: diseño y aplicación de un modelo de análisis multinivel y multidimensional (UniverSEO), GV/2021/141. General description: This dataset includes the raw data used to carry out the work titled "Are link-based and citation-based journal metrics correlated? An Open Access mega publisher case study". The dataset includes the bibliometric and webometric indicators collected to characterize all the journals published by the MDPI, an academic publisher. Keywords: -------------------------- SHARING/ACCESS INFORMATION -------------------------- Open Access to data: Open. Date end Embargo: N/A Licenses/restrictions placed on the data, or limitations of reuse: Creative Commons (CC-BY) Citation for and links to publications that cite or use the data: To be included. Links/relationships to previous or related data sets: N/A Links to other publicly accessible locations of the data: N/A -------------------- DATA & FILE OVERVIEW -------------------- File list: -> journal-level-metrics.csv: includes 56 performance metrics (bibliometrics and webometrics indicators) to 352 journals, obtained from MDPI website, Majestic, and Scopus. -> link-data.csv: includes 28 variables related to each link targeted to MDPI journals. Data obtained from Majestic data. Relationship between files: while the journal-level-metrics file includes overall data at the journal-level, the link-data file only includes the raw data related to each link that a MDPI journal receives. These data is subsequently used to build columns from AH to BG, in the journal-level-metrics file. Type of version of the dataset: raw data Total size: (7.990 KB); dataset uncompressed (136 MB); journal-level-metrics.csv (124KB), link-data.csv (136MB) -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: Bibliometric variables were directly captured from Scopus (, while descriptive data were captured from MDPI website ( Webometric data were captured from Majest database directly. Network variables (eigencentrality and pagerank) were calculated through Gephi 0.91 using linking to linked websites. Methods for processing the data: raw data included in link-data.csv were aggregated at the journal-level to generate webometric variables included in the journal-level-metrics.csv file. Then, descriptive statistics (average, median and 90th percentile were calculated). Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: any spreadsheet application. Standards and calibration information, if appropriate: Environmental/experimental conditions: N/A Describe any quality-assurance procedures performed on the data: N/A -------------------------- DATA-SPECIFIC INFORMATION -------------------------- journal-level-metrics.csv link-data.csv Number of variables: 28 Number of cases/rows: 567900 Variable list: Journal Ref Domain, Source URL, Source Trust Flow, Source, Citation Flow, Target URL, Link Type, Link Density, Is Redirect, Is NoFollow, Is Lost, Source Internal OutLinks, Source External OutLinks, Source Total OutLinks, Source External OutDomains, Source Page Size (KB), Source Language Description, Source Language Confidence, Ref Domain Trust Flow, Ref Domain Citation Flow, Target Trust Flow, Target Citation Flow, Source Topical Trust Flow Topic #1, Source Topical Trust Flow Value #1, Ref Domain Topical Trust Flow Topic #1, Ref Domain Topical Trust Flow Value #1, Target Topical Trust Flow Topic #1, Target Topical Trust Flow Value #1 Missing data codes: missing data is noted with the absence of value. Specialized formats or other abbreviations used: N/A journal-level-metrics.csv Number of variables: 26 Number of cases/rows: 352 Variable list: # (journal id), Journal Name, ISSN, Launched, Age, Articles, URL, Subject 1, Subject 2, Subject 3, Subject 4, Subject 5, Subject 6, Subject 7, Subject 8, Subject 9, Subject 10, Number of subjects, Profile, SCOPUS (yes/no), SNIP, SJR, CiteScore, Publications (all), Documents (2017-20), Cit-i10, Cit10/Pub, Citations (all scopus), Citations (2017-20), % Cited (recent), % Cited (all scopus), Eigencentrality, pageranks, Links (all), Links (English), Links (TF25), Links (TF75), Links (all)/Pubs, Links (English)/Pubs, Links (TF25)/Pubs, Links (TF75)/Pubs, Domains, Target TF, Target CF, Link density (Avg), Link density (Median), Link density (P90), Source TF (Avg), Source TF (median), Source TF (P90), Source CF (Avg), Source CF (median), Source CF (P90), Source External OutLinks (Avg), Source External OutLinks (median), Source External OutLinks (P90), Source External OutDomains (Avg), Source External OutDomains (median), Source External OutDomains (P90) Missing data codes: N/A (Not Applicable); absense of value (missing data). Specialized formats or other abbreviations used: N/A