NEOTOMA PALEOECOLOGY DATABASE

Making 60,000 Taxonomic Records Browsable for Scientists Who Don't Think in SQL

Designed and built an interactive radial taxonomy browser for the NSF-funded Neotoma Paleoecology Database, using grounded theory interviews, activity theory modeling, and think-aloud usability evaluation with domain experts.

UX Research Data Visualization D3.js Usability Testing Information Architecture
Neotoma Taxonomy Hierarchical Visualizer

The tl;dr
Overview

A taxonomy browser for the world's largest paleoecology database

The Neotoma Paleoecology Database is an NSF-funded, open-access platform holding 60,000+ taxonomic records used by Earth scientists studying climate change, biodiversity, and biogeochemical cycles. The data follows FAIR principles (Findable, Accessible, Interoperable, Reusable), but browsing the taxonomy itself was anything but.

Scientists navigated taxonomic hierarchies through raw SQL queries or flat text lists. There was no visual interface to explore how species, genera, families, and phyla related to each other, no way to trace synonym histories, and no tool to understand where a taxon sat within the larger classification structure.

Three contradictions hiding in the workflow

Through Phase I expert interviews (N=7), I identified three systemic contradictions using Activity Theory:

1. The broken toolchain. Rich metadata entered in data entry tools (Tilia) became invisible in discovery tools. Analysis moved "off-books" to R scripts, preventing data cleaning feedback loops.

2. The lumping vs. splitting paradox. Data entry rules forced "splitting" (preserving original taxonomic names from authors), but analysis required "lumping" (standardized regional comparisons). Current tools had no dynamic aggregation capability.

3. Taxonomy as "history of science." Taxonomic names aren't stable labels. They carry histories of reclassification, synonymy, and uncertainty (e.g., "cf.", "aff.", "type sp."). Existing interfaces treated them as flat strings.


Results from two research phases

The project followed a user-centered design model: needs assessment from expert interviews, prototype design informed by grounded theory findings, and formative evaluation through think-aloud testing.

60k+
Taxonomic records made browsable through a radial interface
7 + 5
Expert interviews (Phase I) + think-aloud participants (Phase II)
82-100%
Task success rates across three evaluation tasks
Live
Working prototype deployed for the Neotoma community

Understanding how paleoecologists actually work with taxonomy

01
Expert Interviews with Grounded Theory Coding

I conducted semi-structured interviews with 7 domain experts (data stewards, researchers, database developers) to understand their actual taxonomy workflows. Using inductive grounded theory coding (open, axial, selective), I surfaced patterns that wouldn't have emerged from a feature wishlist.

The key insight: taxonomy in paleoecology isn't a stable reference system. It's a living record of scientific disagreement. A single pollen grain might be classified differently by two researchers based on regional conventions. The visualization needed to surface this uncertainty, not hide it.

02
Activity Theory Contradiction Analysis

I modeled the taxonomy workflow as an Activity Theory system (Subject, Object, Tools, Rules, Community, Division of Labor) and identified systemic contradictions. The broken toolchain between Tilia (data entry) and Explorer (discovery) meant that metadata quality degraded as data moved through the pipeline. This wasn't a UI problem. It was an infrastructure problem that a visualization tool could partially address by making hierarchical relationships and synonym histories visible.


A radial tree that handles 60,000 nodes without drowning the user

03
Radial Dendrogram with Two-Level Hierarchy Control

The core visualization is a radial dendrogram (D3.js cluster layout) representing the full taxonomic hierarchy: Phylum, Class, Order, Family, Genus, Species. The default view shows major taxon groups as terminal nodes. Selecting a group drills into its full sub-tree. This two-level approach prevents the "wall of nodes" problem while preserving the ability to explore any branch in full detail.

04
Search, Synonym Awareness, and Ancestor Highlighting

Users can search any taxon by name or ID. The system highlights the full ancestor path (root to leaf), auto-rotates the tree to center the result, and displays the complete lineage in a side panel. Synonym-aware search surfaces related names and their validity status, directly addressing the "lumping vs. splitting" contradiction from Phase I.

05
Performance: Making 60,000 Nodes Renderable

Rendering a 60,000-node tree in the browser required deliberate performance engineering. I used PostgreSQL materialized views to precompute ancestor paths (eliminating expensive recursive queries at render time), angle-based label culling that progressively reveals text on zoom, and a modular architecture separating data processing, layout, highlighting, and interaction into independent modules. The result: smooth rotation, zoom, and pan on a tree that would crash a naive D3 implementation.


What broke when real scientists used it

06
Task-Based Think-Aloud Protocol

I designed three evaluation tasks mapping to real research workflows: (1) exploring the hierarchy to find a taxon's placement, (2) investigating synonym and validity status, and (3) determining higher-level context for a given species. Five participants completed the tasks while thinking aloud, with screen recording and mouse tracking.

Task 3 (interpreting context) had the highest success rate and lowest completion time (~15.5s). The radial layout made hierarchical context immediately visible. Task 2 (synonym/validity) had the lowest success rate (82.9%), revealing that domain-specific terminology created a "wall of jargon" for non-experts.

07
Mouse-Tracking Heatmap Analysis

Beyond task metrics, I analyzed mouse movement heatmaps to understand where users' attention went. The search bar dominated early exploration, confirming a qualitative finding: users defaulted to search as a compensatory strategy when visual navigation became cluttered or overwhelming. This fed directly into design refinements: reducing visual noise in the default view, improving label readability, and making the search-to-tree connection more prominent.

08
Qualitative Coding of User Feedback

Open, axial, and selective coding of the think-aloud transcripts surfaced three themes: the "wall of jargon" problem (non-experts struggled with domain text), search as compensatory strategy (visual navigation abandonment), and visual ergonomics (physical discomfort with dense label regions leading to avoidance of advanced features). Each theme mapped to specific interface refinements in the next iteration.


What stayed with me

Visualization is information architecture

Choosing a radial dendrogram over a sunburst or a flat list wasn't an aesthetic decision. It was an IA decision about how taxonomic depth, branching factor, and synonym relationships should be encoded spatially. The layout IS the information architecture. Every encoding choice (angle for rank, color for group, link highlighting for ancestry) determines what questions users can answer without thinking.

Theory earns its keep in complex domains

In most product work, you can skip formal theory and go straight to user stories. In paleoecology, where workflows span decades of disciplinary convention, Activity Theory gave me a framework to name contradictions that users experienced but couldn't articulate. "The toolchain is broken" is actionable in a way that "I wish it were easier" is not.

Performance is a UX decision

Materialized views, label culling, and modular architecture aren't engineering details irrelevant to design. A tree that takes 8 seconds to render is a tree nobody will use. Making 60,000 nodes feel instant required design-level thinking about what to show and when, not just how fast the code runs.

Back to All Projects