2019 to 2024 · Sunnyvale, California

John Deere

Senior Software Engineer, Data and Computer Vision Platform

At John Deere's AI division, Blue River Technology, I built platform systems for in-house labeling, AWS machine learning operations, developer kits, Databricks, and production computer vision data pipelines.

field to model loop

From field-camera datato trained models

At Blue River Technology, I built the path from machine cameras to model updates: labeling workflows, Databricks datasets, AWS validation, live tests, and production feedback.

1field data

2labeling workflows

3Databricks tables

4AWS training workflows

5live validation

6production feedback

Platform work

Platform layers behind production computer vision.

01 · Labeling operations

In-house labeling tool and job manager

An internal annotation platform for turning raw field imagery into usable training data. The work covered the product surface, backend services, job dispatch, review states, schema evolution, and the handoff from labeled data into computer vision datasets.

Mode

Human-in-loop

Flow

Jobs to datasets

Annotation APIs, review queues, assignment logic, and worker-backed job processing.
Quality-control loops so labels could be reviewed, enriched, corrected, and reused.
Dataset handoff paths that made labeled frames easier to find, train on, and debug.

+annotation canvas

human in the loop

classes

crop

weed

soil

crop · 0.98weed · 0.87polygon

frame 0488 · 3 of 5 labeledtool · bounding box

job managerreview queue · 24

Raw frames

queued1,420 in

Label job

in progress6 workers

Quality review

in review2 reviewers

Dataset handoff

doneto computer vision

review rejects loop back to relabel · enrich · correct · reuse

02 · Machine learning operations

First shared AWS community cluster for machine learning operations

I helped turn machine learning infrastructure from scattered project machinery into a shared operating layer. That meant standing up the first community cluster on AWS and connecting orchestration, continuous integration, artifacts, releases, and data access into a workflow teams could repeat.

Cloud

AWS

Scope

Shared cluster

Kubeflow and Argo-style workflow orchestration for repeatable training and evaluation runs.
Continuous integration, artifact management, and deployment paths around fast-moving machine learning code.
A practical bridge between research notebooks, scheduled jobs, and production-facing pipelines.

Machine learning operations

Shared operating layer

Cloud

AWS

Scope

Shared cluster

Before · scattered

Perception

own machinery

Mapping

own machinery

Agronomy

own machinery

After · community cluster

One shared AWS cluster

first of its kind

AWS shared cluster

Repeatable training / evaluation pipeline

Commit

code

Build

test

Run

Argo

Train

job

Evaluate

metrics

models

Deploy

serving

evaluation feeds the next run

Continuous integration

fast-moving code

Artifact registry

models + data

Data access

lake + tables

Deployment

serving paths

Platform services wired into every run

03 · Databricks lakehouse

Real-time change data capture and transformation on production-scale data

Databricks became one of the core places where raw operational data could become model-ready signal. I worked on the lakehouse patterns around change data capture, near-real-time transformations, enriched tables, notebook workflows, and repeatable debugging paths.

Pattern

Change data capture

Latency

Near real time

Change streams captured from operational systems and converted into usable analytical tables.
Real-time transformations that made fresh field and production events available for computer vision workflows.
Databricks notebooks and jobs for exploration, enrichment, validation, and incident debugging.

streaming lakehouse pipeline

near real time

sources

operational systems

operational database

row changes

field metadata

machine context

machine + model outputs

inference events

change data capture

stream

+insert

~update

−delete

transform · medallion

ascending quality

01raw

landed change events

02refined

cleaned + deduplicated

03enriched

joined + feature-ready

Delta

enriched Delta tables → model-ready signal

consumers

computer vision workflows

fresh field + production events

notebooks + jobs

explore · enrich · validate · debug

operational change→captured→transformed→model-ready signal

04 · Developer experience

Internal developer kits and live testing features

The platform needed sharp tools, not just infrastructure. I built reusable developer kits and test harnesses that helped engineers query the lake, stage examples, exercise APIs, inspect images, and run live testing features against realistic data paths.

Users

Engineers

Surface

APIs + notebooks

Reusable clients for data lake queries, service connectors, image APIs, and database access.
Live testing features that let teams validate field-facing behavior without waiting for a full release loop.
Notebook-first utilities that made repeated computer vision investigations faster and less fragile.

dev kit session

query · stage · predict · live test

users

engineers

surface

APIs + notebooks

devkit · console

›kit.lake.query("field_run_id = 4821")

↳1,284 frames

›kit.images.stage(sample_set)

↳staged · 64

›kit.client.api.predict(frame)

↳1 box · 0.94

›kit.live_test.compare(prediction, label)

↳matchpass

image inspectorframe 0188

prediction 0.94label · crop

prediction vs label: match

query result4 rows · overlap 0.91

frame_01880.94✓

frame_01890.91✓

frame_01900.88✓

lake client

query

image API

predict

notebook

investigate

test harness

validate

05 · Field-to-model pipelines

Petabyte-scale computer vision data from test fields and production tractors

The core challenge was not just storing images. It was making huge streams from test fields and production spray tractors searchable, enrichable, debuggable, and useful for model development across the whole computer-vision lifecycle.

Data

Petabytes

Sources

Fields + tractors

Pipelines for ingesting, organizing, enriching, and querying field imagery and machine events.
Debugging loops that connected model failures back to frames, labels, metadata, and field context.
Data access patterns spanning lake queries, staged datasets, APIs, and machine-learning-ready feature views.

◇field → model lifecycle

petabyte-scale

sources · streamingimagery + machine events

test fields18 rigs

spray tractorsfleet · live

ingest · 4.2 million frames / day

data lake · organized + searchableenrich → query

raw frame

geolocationmachine eventcrop + weed labelsfield context

access patterns

lake

queries

staged

datasets

image

interfaces

feature

views

machine-learning-ready feature views

model developmenttrain · evaluate · ship

training

evaluation

ship

model failureresolves back to frames · labels · metadata · field context

debug loop · failures feed re-enrichment

ingest→enrich→query→train→debug→re-enrich