Sharing data sets between chapters #16

rgommers · 2017-07-27T06:47:56Z

From Debra's email: Matt Rocklin suggested using some data sets in common through the book, so feel free to coordinate with others on the project. The Dask chapter will also be written using the data and projects described in some of the other chapters.

@mrocklin do you have an overview of data sets already in use? For the SciPy chapter we'd be happy to reuse something as well.

rgommers · 2017-07-27T06:49:02Z

Cc @WarrenWeckesser @ev-br

mrocklin · 2017-07-27T11:55:18Z

@mrocklin do you have an overview of data sets already in use? For the SciPy chapter we'd be happy to reuse something as well.

I personally have no exposure to what people have been doing. I like the idea of coordinating on datasets and examples, but have made no concrete steps in this direction.

Perhaps this issue is such a step? If others are around it might be interesting to list both our constraints for datasets for our sections as well as some datasets that we know about and appreciate.

For example for dask we have the following constraints:

It is useful if the data is inconveniently large, so that parallelism or off-memory approaches can be relevent.
It is useful if functions used in other examples are serializable (this is usually the case)

Datasets that we've frequently used in tutorials and examples include the following:

The NYC Taxi dataset
Various meteorology datasets, in particular ECMWF has public downloads
Airlines
...

rgommers · 2017-08-02T07:44:48Z

Perhaps this issue is such a step?

+1

For SciPy we are pretty flexible in terms of datasets to use. We do need:

time series data, for IIR/FIR functionality. EDIT: we've now adding a data set for this, pressure measurements: pressure.dat
one dataset that is large enough for using scipy.LowLevelCallable sensibly

jbednar · 2018-04-28T03:44:23Z

We're using the measles incidence dataset highlighted in the Wall Street Journal a while back in our chapter (#26), along with some NYC taxi data, if anyone wants to use those.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing data sets between chapters #16

Sharing data sets between chapters #16

rgommers commented Jul 27, 2017

rgommers commented Jul 27, 2017

mrocklin commented Jul 27, 2017

rgommers commented Aug 2, 2017 •

edited

Loading

jbednar commented Apr 28, 2018

Sharing data sets between chapters #16

Sharing data sets between chapters #16

Comments

rgommers commented Jul 27, 2017

rgommers commented Jul 27, 2017

mrocklin commented Jul 27, 2017

rgommers commented Aug 2, 2017 • edited Loading

jbednar commented Apr 28, 2018

rgommers commented Aug 2, 2017 •

edited

Loading