ArchDataPy Tutorial

This short tutorial will walk you through using the archdatapy package to access archaeological datasets from the R archdata package hosted on CRAN. The tutorial covers how to download these datasets and load them into a Python session as pandas DataFrames for easy analysis.

Libraries

First, import the necessary libraries. The archdatapy package provides two main functions: - get_archdata: Downloads and extracts the R datasets. - load_archdata: Loads a specified .rda file into Python as a pandas DataFrame.

[1]:
from archdatapy import get_archdata, load_archdata
from pprint import pprint

Step 1: Download and List Available Datasets

The first step is to use get_archdata to download the archdata package from CRAN, extract the datasets, and return a dictionary containing the dataset names and file paths.

[2]:
# Download datasets and get the file paths
file_paths = get_archdata()
print("Available datasets and their file paths:")
for dataset_name, path in file_paths.items():
    print(f"{dataset_name}: {path}")
Available datasets and their file paths:
Acheulean: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Acheulean.rda
Arnhofen: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Arnhofen.rda
BACups: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\BACups.rda
BarmoseI.grid: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\BarmoseI.grid.rda
BarmoseI.pp: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\BarmoseI.pp.rda
Bornholm: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Bornholm.rda
DartPoints: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\DartPoints.rda
EIAGraves: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EIAGraves.rda
EndScrapers: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EndScrapers.rda
EngrBone: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EngrBone.rda
ESASites: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\ESASites.rda
EWBurials: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\EWBurials.rda
Fibulae: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Fibulae.rda
Handaxes: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Handaxes.rda
MaskSite: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\MaskSite.rda
Mesolithic: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Mesolithic.rda
Michelsberg: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Michelsberg.rda
Nelson: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Nelson.rda
Olorgesailie.maj: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Olorgesailie.maj.rda
Olorgesailie.sub: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Olorgesailie.sub.rda
OxfordPots: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\OxfordPots.rda
PitHouses: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\PitHouses.rda
RBGlass1: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\RBGlass1.rda
RBGlass2: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\RBGlass2.rda
RBPottery: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\RBPottery.rda
Snodgrass: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\Snodgrass.rda
TRBPottery: C:\Users\carleton\AppData\Local\Temp\tmpt4bzzzkp\archdata/data\TRBPottery.rda

After running this, you should see a list of dataset names along with their file paths. Each key in the file_paths dictionary is a dataset name (based on the file name without extension), and each value is the full path to the corresponding .rda file.

Step 2: Loading a Specific Dataset

Now that we have the paths, we can use load_archdata to load a specific dataset. Choose one of the dataset names from the output above, then pass its file path to load_archdata. This function will load the contents of the .rda file and return a dictionary of objects.

[3]:
# Choose a dataset to load (replace 'YourDatasetName' with an actual dataset name from file_paths keys)
dataset_name = 'Acheulean'  # e.g., "MesaVerde"
data = load_archdata(file_paths[dataset_name])

# Inspect the loaded data
print(f"Contents of the dataset '{dataset_name}':")
for obj_name, obj_value in data.items():
    print(f"Object name: {obj_name}, Object type: {type(obj_value)}")
Contents of the dataset 'Acheulean':
Object name: Acheulean, Object type: <class 'pandas.core.frame.DataFrame'>

For datasets that are simple tables or dataframes, we can immediately see them in the notebook by referencing the appropriate key in the dictionary returned by the underlying call to pyreadr:

[4]:
data[dataset_name]
[4]:
Lat Long HA CL KN FS D CS P CH SP OLT SS OST
rownames
Olorgesailie -1.58 36.45 197 96 58 17 5 11 3 32 52 6 213 218
Isimila -7.90 35.61 246 208 30 28 6 30 16 62 17 15 98 64
Kalambo Falls -8.60 31.24 337 264 59 96 8 124 18 69 6 17 303 48
Lochard -19.92 29.02 45 13 3 2 12 1 0 32 3 8 46 22
Kariandusi -0.45 36.26 132 56 47 23 3 5 7 6 5 8 17 25
Broken Hill -14.43 28.45 1 8 1 1 0 1 0 4 25 0 35 18
Nsongezi -1.03 30.78 15 19 2 9 1 28 1 19 0 10 17 70

And, that’s it! Time to analyze the data you downloaded.