{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ArchDataPy Tutorial\n", " \n", "This short tutorial will walk you through using the `archdatapy` package to access archaeological datasets from the R `archdata` package hosted on CRAN. The tutorial covers how to download these datasets and load them into a Python session as `pandas` DataFrames for easy analysis.\n", " \n", "## Libraries\n", " \n", "First, import the necessary libraries. The `archdatapy` package provides two main functions:\n", " - `get_archdata`: Downloads and extracts the R datasets.\n", " - `load_archdata`: Loads a specified `.rda` file into Python as a `pandas` DataFrame.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from archdatapy import get_archdata, load_archdata\n", "from pprint import pprint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Download and List Available Datasets\n", " \n", "The first step is to use `get_archdata` to download the `archdata` package from CRAN, extract the datasets, and return a dictionary containing the dataset names and file paths.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Available datasets and their file paths:\n", "Acheulean: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Acheulean.rda\n", "Arnhofen: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Arnhofen.rda\n", "BACups: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\BACups.rda\n", "BarmoseI.grid: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\BarmoseI.grid.rda\n", "BarmoseI.pp: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\BarmoseI.pp.rda\n", "Bornholm: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Bornholm.rda\n", "DartPoints: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\DartPoints.rda\n", "EIAGraves: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\EIAGraves.rda\n", "EndScrapers: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\EndScrapers.rda\n", "EngrBone: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\EngrBone.rda\n", "ESASites: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\ESASites.rda\n", "EWBurials: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\EWBurials.rda\n", "Fibulae: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Fibulae.rda\n", "Handaxes: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Handaxes.rda\n", "MaskSite: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\MaskSite.rda\n", "Mesolithic: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Mesolithic.rda\n", "Michelsberg: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Michelsberg.rda\n", "Nelson: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Nelson.rda\n", "Olorgesailie.maj: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Olorgesailie.maj.rda\n", "Olorgesailie.sub: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Olorgesailie.sub.rda\n", "OxfordPots: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\OxfordPots.rda\n", "PitHouses: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\PitHouses.rda\n", "RBGlass1: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\RBGlass1.rda\n", "RBGlass2: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\RBGlass2.rda\n", "RBPottery: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\RBPottery.rda\n", "Snodgrass: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\Snodgrass.rda\n", "TRBPottery: C:\\Users\\carleton\\AppData\\Local\\Temp\\tmpt4bzzzkp\\archdata/data\\TRBPottery.rda\n" ] } ], "source": [ "# Download datasets and get the file paths\n", "file_paths = get_archdata()\n", "print(\"Available datasets and their file paths:\")\n", "for dataset_name, path in file_paths.items():\n", " print(f\"{dataset_name}: {path}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After running this, you should see a list of dataset names along with their file paths. Each key in the `file_paths` dictionary is a dataset name (based on the file name without extension), and each value is the full path to the corresponding `.rda` file.\n", "\n", "## Step 2: Loading a Specific Dataset\n", " \n", "Now that we have the paths, we can use `load_archdata` to load a specific dataset. Choose one of the dataset names from the output above, then pass its file path to `load_archdata`. This function will load the contents of the `.rda` file and return a dictionary of objects." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Contents of the dataset 'Acheulean':\n", "Object name: Acheulean, Object type: \n" ] } ], "source": [ "# Choose a dataset to load (replace 'YourDatasetName' with an actual dataset name from file_paths keys)\n", "dataset_name = 'Acheulean' # e.g., \"MesaVerde\"\n", "data = load_archdata(file_paths[dataset_name])\n", "\n", "# Inspect the loaded data\n", "print(f\"Contents of the dataset '{dataset_name}':\")\n", "for obj_name, obj_value in data.items():\n", " print(f\"Object name: {obj_name}, Object type: {type(obj_value)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For datasets that are simple tables or dataframes, we can immediately see them in the notebook by referencing the appropriate key in the dictionary returned by the underlying call to `pyreadr`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LatLongHACLKNFSDCSPCHSPOLTSSOST
rownames
Olorgesailie-1.5836.45197965817511332526213218
Isimila-7.9035.612462083028630166217159864
Kalambo Falls-8.6031.2433726459968124186961730348
Lochard-19.9229.02451332121032384622
Kariandusi-0.4536.261325647233576581725
Broken Hill-14.4328.45181101042503518
Nsongezi-1.0330.781519291281190101770
\n", "
" ], "text/plain": [ " Lat Long HA CL KN FS D CS P CH SP OLT SS \\\n", "rownames \n", "Olorgesailie -1.58 36.45 197 96 58 17 5 11 3 32 52 6 213 \n", "Isimila -7.90 35.61 246 208 30 28 6 30 16 62 17 15 98 \n", "Kalambo Falls -8.60 31.24 337 264 59 96 8 124 18 69 6 17 303 \n", "Lochard -19.92 29.02 45 13 3 2 12 1 0 32 3 8 46 \n", "Kariandusi -0.45 36.26 132 56 47 23 3 5 7 6 5 8 17 \n", "Broken Hill -14.43 28.45 1 8 1 1 0 1 0 4 25 0 35 \n", "Nsongezi -1.03 30.78 15 19 2 9 1 28 1 19 0 10 17 \n", "\n", " OST \n", "rownames \n", "Olorgesailie 218 \n", "Isimila 64 \n", "Kalambo Falls 48 \n", "Lochard 22 \n", "Kariandusi 25 \n", "Broken Hill 18 \n", "Nsongezi 70 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[dataset_name]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, that's it! Time to analyze the data you downloaded." ] } ], "metadata": { "kernelspec": { "display_name": "launchpad", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }