Commit 35c9af23 authored by Michael Wagner's avatar Michael Wagner
Browse files

wrote the readme.md

parent 16daf484
# orcid-harvester
https://zenodo.org/oai2d?verb=ListRecords&metadataPrefix=oai_dc
Support script for CRIS.
Researchers can have their research data and meta-data in CRIS.
Some already might use Zenodo.
takes orcids, returns meta information
This script searches [Zenodo.org](https://zenodo.org/) with given [ORCID(s)](https://orcid.org/) and returns the Zenodo data as a JSON.
## Zenodo specifics
- [Zenodo guide for developers](https://developers.zenodo.org/?python#sets)
- [example use of the request api in the browser](https://zenodo.org/search?page=1&size=20&q=creators.orcid:%220000-0003-0555-4128%22)
- how to do requests programatically:
```python
params={'q': 'creators.orcid:"' + orcid + '"'
response = requests.get('https://zenodo.org/api/records', params)
data = response.json()
```
## Using this script
### Quick Start: directly
Execute the main
in **./json-results/** you'll find a folder per orcid, which contains all zenodo entries for that ID
### Quick Start: in your own code
````python
creators_orcids = ['0000-0003-0555-4128']
hits = harvest_by_orcid(creators_orcids)
````
\ No newline at end of file
......@@ -12,7 +12,6 @@
## Status: {in development}
##################################################
# ToDo: does the zendodo api return ALL results? no! only 10 results per request
# ToDo: grab by FAU tag (organization or so it was called)
# resumption token error valid for 2min -> 422 Unprocessable Entity error
......@@ -25,9 +24,6 @@ import os
import time
# import pandas as pd
# -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
# HELPER
# -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
......@@ -82,7 +78,14 @@ def save_hits_locally(orcid, hits):
i += 1
def request_json(url, slow_down, params=None ):
def request_json(url, slow_down, params=None):
"""
retrieves the zenodo response for normal requests and requests needed for people with a lot of entries
:param url: url to request
:param slow_down: sleep parameter for request limits
:param params: request parameters like e.g. the orcid
:return: dict containing the response
"""
time.sleep(slow_down)
response = requests.get(url, params)
data = response.json()
......@@ -109,55 +112,46 @@ def harvest_by_community():
print("WIP")
def harvest_by_orcid(orcids):
def harvest_by_orcid(orcids, save_locally = True, slow_down=2):
"""
https://developers.zenodo.org/?python#changes
https://zenodo.org/search?page=1&size=20&q=creators.orcid:%220000-0001-7430-3694%22
https://zenodo.org/oai2d
uses the zenodo API to request all entries for the users given by their ORCID
example use in __main__
:param creators_orcids: array containing the orcirds of creators as strings
:param slow_down: zendodo.org has a request limit, atm the request limit is set, so it does not break the per hour constraint
:return: a dict containing lists. the key is the orcid, the content are all the results per orcid
"""
# request limit, so sleep is needed
slow_down = 2
# return dict with results of queries
orcid_to_hits_dicts = {}
for orcid in orcids:
# max 120 requests per minute, .01 to make sure
time.sleep(slow_down)
#query = 'creators.orcid:"' + orcid + '"'
#response = requests.get('https://zenodo.org/api/records', params={'q': query})
#data = response.json()
data = request_json('https://zenodo.org/api/records', slow_down, params={'q': 'creators.orcid:"' + orcid + '"'})
if 'status' in data:
# something went wrong, proceed to error handling
slow_down = error_handling(data['status'], slow_down)
elif 'hits' in data:
# entry successfully found, harvest data, check if next entry page exists
orcid_hits = data['hits']['hits']
# bigger than 10 -> grab the next 10
if 'links' in data:
while 'next' in data['links']:
print('requesting more...')
print(orcid, 'requesting more...')
next = data['links']['next']
data = request_json(next, slow_down)
orcid_hits += data['hits']['hits']
# saves hits to nested folder structure
save_hits_locally(orcid, orcid_hits)
if save_locally:
save_hits_locally(orcid, orcid_hits)
# collect hits list into a dict, which is the functions return value
orcid_to_hits_dicts[orcid] = orcid_hits
return orcid_to_hits_dicts
if __name__ == '__main__':
creators_orcids = ['0000-0001-7430-3694', '0000-0002-8824-6405', '0000-0003-2136-0788', '0000-0002-8273-6059']
creators_orcids = ['0000-0003-0555-4128']
# creators_orcids = ['0000-0002-8273-6059']
creators_orcids = ['0000-0001-7430-3694', '0000-0003-0555-4128', '0000-0002-8273-6059']
# creators_orcids = ['0000-0003-0555-4128']
hits = harvest_by_orcid(creators_orcids)
print("wow")
print("Done")
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment