User guide
Install
Install using pip:
pip install pyncei
Alternatively, you can use the environment.yml file included in the GitHub repository to build a conda environment and install pyncei there:
conda env create -f environment.yml
conda activate pyncei
pip install pyncei
This method includes geopandas, which is absent from the pip
installation but if installed allows the
to_dataframe()
method to return a
GeoDataFrame when coordinates are provided by NCEI.
Getting started
To use the NCEI web services, you’ll need a token. The token is a
32-character string provided by NCEI; users can request one
here. Pass the token to
NCEIBot
to get started:
from pyncei import NCEIBot
ncei = NCEIBot("ExampleNCEIAPIToken")
You can cache queries by using the cache_name parameter when creating an
NCEIBot
object:
ncei = NCEIBot("ExampleNCEIAPIToken", cache_name="ncei_cache")
The cache uses
CachedSession
from the requests-cache module. Caching behavior can be modified by
passing keyword arguments accepted by that class to
NCEIBot
. For example, successful requests are
cached indefinitely by default if the cache is being used. Users can
change this behavior using the expire_after keyword argument when
initializing an NCEIBot
object.
NCEIBot
includes methods corresponding to each
of the endpoints described on the CDO website. Query parameters
specified by CDO can be passed as arguments:
response = ncei.get_data(
datasetid="GHCND",
stationid=["GHCND:USC00186350"],
datatypeid=["TMIN", "TMAX"],
startdate="2015-12-01",
enddate="2015-12-02",
)
Each method call may make multiple requests to the API, for example, if
more than 1,000 daily records are requested. Responses are combined in
an NCEIResponse
object, which extends the list
class. Individual responses can be accessed using list methods, for
example, by iterating through the object or accessing a single item
using its index. Data from all responses can be accessed using the
values()
method, which returns an
iterator of dicts, each of which is a single result:
for val in response.values():
print(val)
The response object includes a
to_csv()
method to write results to a
file:
response.to_csv("station_data.csv")
As well as a to_dataframe()
method to
write results to a pandas DataFrame (or a geopandas GeoDataFrame if that
module is installed and the results include coordinates):
df = response.to_dataframe()
The table below provides an overview of the available endpoints and their corresponding methods:
CDO Endpoint |
CDO Query Parameter |
NCEIBot Method |
Values |
---|---|---|---|
datasetid |
|||
datacategoryid |
|||
datatypeid |
|||
locationcategoryid |
|||
locationid |
|||
stationid |
– |
||
– |
– |
Each of the NCEIBot get methods accepts either a single positional
argument (used to return data for a single entity) or a series of
keyword arguments (used to search for and retrieve all matching
entities). Unlike CDO, which accepts only ids,
NCEIBot
will try to work with either ids or name
strings. If names are provided, NCEIBot
attempts
to map the name strings to valid ids using
find_ids()
:
ncei.find_ids("District of Columbia", "locations")
If a unique match cannot be found,
find_ids()
returns all identifiers that
contain the search term. If you have no idea what data is available or
where to look, you can search across all endpoints by omitting the
endpoint argument:
ncei.find_ids("temperature")
Or you can browse the source files in the Values column of the table
above. The data in these files shouldn’t change much, but they can be
updated using refresh_lookups()
if
necessary:
ncei.refresh_lookups()
Example: Find and return data from a station
from datetime import date
from pyncei import NCEIBot, NCEIResponse
# Initialize NCEIBot object using your token string
ncei = NCEIBot("ExampleNCEIAPIToken", cache_name="ncei")
# Set the date range
mindate = date(2016, 1, 1) # either yyyy-mm-dd or a datetime object
maxdate = date(2019, 12, 31)
# Get all DC stations operating between mindate and maxdate
stations = ncei.get_stations(
datasetid="GHCND",
datatypeid=["TMIN", "TMAX"],
locationid="FIPS:11",
startdate=mindate,
enddate=maxdate,
)
# Select the station with the best data coverage
station = sorted(stations.values(), key=lambda s: -int(s["datacoverage"]))[0]
# Get temperature data for the given dates. Note that for the
# data endpoint, you can't request more than one year's worth of daily
# data at a time.
year = maxdate.year
response = NCEIResponse()
while year >= mindate.year:
response.extend(
ncei.get_data(
datasetid=datasetid,
stationid=station["id"],
datatypeid=datatypeids,
startdate=date(year, 1, 1),
enddate=date(year, 12, 31),
)
)
year -= 1
# Save values to CSV using the to_csv method
response.to_csv(station["id"].replace(":", "") + ".csv")
# Alternatively, merge observation and station data together in a pandas
# DataFrame. If geopandas is installed and coordinates are given, this
# method will return a GeoDataFrame instead.
df_stations = stations.to_dataframe()
df_response = response.to_dataframe()
df_merged = df_stations.merge(df_response, left_on="id", right_on="station")