Metadata-Version: 2.1 Name: pooch Version: 1.8.1 Summary: "Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks." Home-page: https://github.com/fatiando/pooch Author: The Pooch Developers Author-email: fatiandoaterra@protonmail.com Maintainer: "Leonardo Uieda" Maintainer-email: leouieda@gmail.com License: BSD 3-Clause License Project-URL: Documentation, https://www.fatiando.org/pooch Project-URL: Release Notes, https://github.com/fatiando/pooch/releases Project-URL: Bug Tracker, https://github.com/fatiando/pooch/issues Project-URL: Source Code, https://github.com/fatiando/pooch Keywords: data,download,caching,http Platform: any Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Education Classifier: License :: OSI Approved :: BSD License Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Topic :: Scientific/Engineering Classifier: Topic :: Software Development :: Libraries Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Requires-Python: >=3.7 Description-Content-Type: text/markdown License-File: LICENSE.txt Requires-Dist: platformdirs >=2.5.0 Requires-Dist: packaging >=20.0 Requires-Dist: requests >=2.19.0 Provides-Extra: progress Requires-Dist: tqdm <5.0.0,>=4.41.0 ; extra == 'progress' Provides-Extra: sftp Requires-Dist: paramiko >=2.7.0 ; extra == 'sftp' Provides-Extra: xxhash Requires-Dist: xxhash >=1.4.3 ; extra == 'xxhash'
Documentation (latest) • Documentation (main branch) • Contributing • Contact
Part of the Fatiando a Terra project
## About > Just want to download a file without messing with `requests` and `urllib`? > Trying to add sample datasets to your Python package? > **Pooch is here to help!** *Pooch* is a **Python library** that can manage data by **downloading files** from a server (only when needed) and storing them locally in a data **cache** (a folder on your computer). * Pure Python and minimal dependencies. * Download files over HTTP, FTP, and from data repositories like Zenodo and figshare. * Built-in post-processors to unzip/decompress the data after download. * Designed to be extended: create custom downloaders and post-processors. Are you a **scientist** or researcher? Pooch can help you too! * Host your data on a repository and download using the DOI. * Automatically download data using code instead of telling colleagues to do it themselves. * Make sure everyone running the code has the same version of the data files. ## Projects using Pooch [SciPy](https://github.com/scipy/scipy), [scikit-image](https://github.com/scikit-image/scikit-image), [Ensaio](https://github.com/fatiando/ensaio), [MetPy](https://github.com/Unidata/MetPy), [napari](https://github.com/napari/napari), [icepack](https://github.com/icepack/icepack), [histolab](https://github.com/histolab/histolab), [seaborn-image](https://github.com/SarthakJariwala/seaborn-image), [Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox), [climlab](https://github.com/climlab/climlab), [mne-python](https://github.com/mne-tools/mne-python), [GemGIS](https://github.com/cgre-aachen/gemgis) > If you're using Pooch, **send us a pull request** adding your project to the list. ## Example For a **scientist downloading a data file** for analysis: ```python import pooch import pandas as pd # Download a file and save it locally, returning the path to it. # Running this again will not cause a download. Pooch will check the hash # (checksum) of the downloaded file against the given value to make sure # it's the right file (not corrupted or outdated). fname_bathymetry = pooch.retrieve( url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz", known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82", ) # Pooch can also download based on a DOI from certain providers. fname_gravity = pooch.retrieve( url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz", known_hash="md5:1dee324a14e647855366d6eb01a1ef35", ) # Load the data with Pandas data_bathymetry = pd.read_csv(fname_bathymetry) data_gravity = pd.read_csv(fname_gravity) ``` For **package developers** including sample data in their projects: ```python """ Module mypackage/datasets.py """ import pkg_resources import pandas import pooch # Get the version string from your project. You have one of these, right? from . import version # Create a new friend to manage your sample data storage GOODBOY = pooch.create( # Folder where the data will be stored. For a sensible default, use the # default cache folder for your OS. path=pooch.os_cache("mypackage"), # Base URL of the remote data store. Will call .format on this string # to insert the version (see below). base_url="https://github.com/myproject/mypackage/raw/{version}/data/", # Pooches are versioned so that you can use multiple versions of a # package simultaneously. Use PEP440 compliant version number. The # version will be appended to the path. version=version, # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev # version and replace the version with this string. version_dev="main", # An environment variable that overwrites the path. env="MYPACKAGE_DATA_DIR", # The cache file registry. A dictionary with all files managed by this # pooch. Keys are the file names (relative to *base_url*) and values # are their respective SHA256 hashes. Files will be downloaded # automatically when needed (see fetch_gravity_data). registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"} ) # You can also load the registry from a file. Each line contains a file # name and it's sha256 hash separated by a space. This makes it easier to # manage large numbers of data files. The registry file should be packaged # and distributed with your software. GOODBOY.load_registry( pkg_resources.resource_stream("mypackage", "registry.txt") ) # Define functions that your users can call to get back the data in memory def fetch_gravity_data(): """ Load some sample gravity data to use in your docs. """ # Fetch the path to a file in the local storage. If it's not there, # we'll download it. fname = GOODBOY.fetch("gravity-data.csv") # Load it with numpy/pandas/etc data = pandas.read_csv(fname) return data ``` ## Getting involved 🗨️ **Contact us:** Find out more about how to reach us at [fatiando.org/contact](https://www.fatiando.org/contact/). 👩🏾💻 **Contributing to project development:** Please read our [Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md) to see how you can help and give feedback. 🧑🏾🤝🧑🏼 **Code of conduct:** This project is released with a [Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms. > **Imposter syndrome disclaimer:** > We want your help. **No, really.** There may be a little voice inside your > head that is telling you that you're not ready, that you aren't skilled > enough to contribute. We assure you that the little voice in your head is > wrong. Most importantly, **there are many valuable ways to contribute besides > writing code**. > > *This disclaimer was adapted from the* > [MetPy project](https://github.com/Unidata/MetPy). ## License This is free software: you can redistribute it and/or modify it under the terms of the **BSD 3-clause License**. A copy of this license is provided in [`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).