This README.txt file was generated on <25-03-2023> by <Enrique Orduna-Malea>

-------------------
GENERAL INFORMATION
-------------------

Title of Dataset: The NASA videos collection on Twitch

Author Information:

	Author #1: Orduña-Malea, E., Universitat Politècnica de València, Camino de Vera s/n, 46022, Valencia (Spain), enorma@upv.es, https://orcid.org/0000-0002-1989-8477
	Author #2: Lopezosa, C., Department d´Informació i Mitjans Audiovisuals, Universitat de Barcelona, Barcelona (Spain), lopezosa@ub.edu, https://orcid.org/0000-0001-8619-2194

Date of data collection: From January 2023 to March 2023.

Geographic location of data collection: Valencia (Spain). 39.48512,-0.34134 

Information about funding sources or sponsorship that supported the collection of the data: Generalitat Valenciana, Posicionamiento académico web de las universidades españolas: diseño y aplicación de un modelo de análisis multinivel y multidimensional (UniverSEO), GV/2021/141.

General This dataset includes the raw data used to conduct a Twitch case study. The dataset includes the metrics collected from Twitch API to characterize a specific channel (NASA), the bibliographic data collected from bibliographic databases to systematically review the literature about Twitch, supplementary material, and the scripts used to collect data from Twitch API.
Keywords: Science communication; Science studies; social media metrics; video streaming; Twitch; informetrics.


--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 

Open Access to data: Open.

Date end Embargo: N/A

Licenses/restrictions placed on the data, or limitations of reuse: Creative Commons (CC-BY)

Citation for and links to publications that cite or use the data: 

Orduña-Malea, E.; Lopezosa, C. (2014). Uncovering the potential of Twitch as a source for social media metrics. First Monday,29(1).
https://dx.doi.org/10.5210/fm.v29i1.13214

Links/relationships to previous or related data sets: N/A
Links to other publicly accessible locations of the data: N/A



--------------------
DATA & FILE OVERVIEW
--------------------

File list: 

dataset.zip
-> nasa: this fold includes the JSON files with raw data from Twitch.
-> nasa/cheermotes.json: includes the cheermotes available for the NASA's Twitch account.
-> nasa/clips.json: includes the clips created from videos published by the NASA's Twitch account.
-> nasa/followers.json: includes the list of users that follow the NASA's Twitch account.
-> nasa/videos: json: includes the list of videos published and deposited on the NASA's Twitch account.
-> scripts: this fold includes python scripts.
-> scripts/twitch_get-clips.py: includes a python script that allows the extration of clips created from specific Twitch accounts.
-> scripts/twitch_get-followers.py: includes a python script that allows the extration of users that follow one specific Twitch account.
-> scripts/twitch_get-users.py: includes a python script that allows the extration of Twitch users data.
-> scripts/twitch_get-videos.py: includes a python script that allows the extration of videos created from specific Twitch accounts.
-> scripts/twitch_search-channels.py: includes a python script that allows searching Twitch channels according to specific parameters.
-> corpus: this fold includes information about the bibliographic records collected to carry out a systematic literature review on Twitch.
-> corpus/records.csv: this excel file includes the bibliographic record for each publication about Twitch collected. For each specific record, basic bibliographic metadata is offered. In addition, all data provided by Humata for each publication is included.
-> supplements: this fold includes supplementary material created to accompany specific publications derived from this dataset.
-> supplement/supplement_1.pdf: includes information about the official Twitch API (see Sharing/Access information to obtain more information).


Relationship between files: this dataset provides data related to an study oriented to test Twitch as a data source for science communication studies. The Python scripts were used to collect data from the NASA's Twitch account, used as a case study. The corpus is an independent set of data, related to all publications about Twitch. The purpose of the supplementary material fold is to include specific supplementary data. Supplement 1 includes information related to the Twitch API characteristics.

Type of version of the dataset: raw data

Total size: dataset (941 MB); corpus (243KB), nasa (940 MB), scripts (13.5 KB), supplements (413 KB).


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

Description of methods used for collection/generation of data: Publications about Twitch were collected from Scopus, Web of Science, Dimensions and Google Scholar. The publications were curated to exclude false positives. For each publication, Humata was used to extract information about the general topic of the study. Otherwise, the scripts were used to collect data from Twitch API. Data was extracted from the NASA channel, used as case study. Data about videos, clips, users, followers and cheermotes were extracted in JSON files.

Methods for processing the data: all data previously collected was exported to spreadsheets to generate descriptive statistics. Tableau and Drive were used to generate visualizations as well.

Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: No files are restricted to specific softwares to be opened or used.

Standards and calibration information, if appropriate: N/A

Environmental/experimental conditions: N/A

Describe any quality-assurance procedures performed on the data: N/A



--------------------------
DATA-SPECIFIC INFORMATION <Crear secciones para cada archivo o conjunto de datos, según proceda>
--------------------------

records.xlsx

Number of variables: 7

Number of cases/rows: 449

Variable list:
ID, Title, Year, DOI, Full-text, Tested, Humata.
   
Missing data codes: missing data is noted with N/A.

Specialized formats or other abbreviations used: Not found

cheermotes.json

Number of variables: 31

Number of cases/rows: 35

Variable list:
type, order, prefix, last_updated, is_charitable, tiers - _ - id, tiers - _ - color, tiers - _ - min_bits, tiers - _ - can_cheer, tiers - _ - show_in_bits_card, tiers - _ - images - dark - static - 1, tiers - _ - images - dark - static - 2, tiers - _ - images - dark - static - 3, tiers - _ - images - dark - static - 4, tiers - _ - images - dark - static - 1.5, tiers - _ - images - dark - animated - 1, tiers - _ - images - dark - animated - 2, tiers - _ - images - dark - animated - 3, tiers - _ - images - dark - animated - 4, tiers - _ - images - dark - animated - 1.5, tiers - _ - images - light - static - 1, tiers - _ - images - light - static - 2, tiers - _ - images - light - static - 3, tiers - _ - images - light - static - 4, tiers - _ - images - light - static - 1.5, tiers - _ - images - light - animated - 1, tiers - _ - images - light - animated - 2, tiers - _ - images - light - animated - 3, tiers - _ - images - light - animated - 4, tiers - _ - images - light - animated - 1.5

   
Missing data codes: missing data is noted with N/A.

Specialized formats or other abbreviations used: Not found

clips.json

Number of variables: 16

Number of cases/rows: 51,935

Variable list: id, url, title, game_id, video_id, language, duration, embed_url, creator_id, view_count, created_at, vod_offset, creator_name, thumbnail_url, broadcaster_id, broadcaster_name
   
Missing data codes: missing data is noted with N/A.

Specialized formats or other abbreviations used: Not found

final.json

Number of variables: 7

Number of cases/rows:

Variable list:
to_id, from_id, to_name, to_login, from_name, from_login, followed_at
   
Missing data codes: missing data is noted with N/A.

Specialized formats or other abbreviations used: Not found

videos.json

Number of variables: 17

Number of cases/rows: 197

Variable list:
id, url, type, title, user_id, viewable, language, duration, stream_id, user_name, user_login, created_at, view_count, description, published_at, thumbnail_url, muted_segments

   
Missing data codes: missing data is noted with N/A.

Specialized formats or other abbreviations used: Not found