2019 to 2024 · Sunnyvale, California

John Deere

Senior Software Engineer, Data and Computer Vision Platform

At John Deere's AI division, Blue River Technology, I built platform systems for in-house labeling, AWS machine learning operations, developer kits, Databricks, and production computer vision data pipelines.

field to model loop

From field-camera datato trained models

At Blue River Technology, I built the path from machine cameras to model updates: labeling workflows, Databricks datasets, AWS validation, live tests, and production feedback.

1field data
2labeling workflows
3Databricks tables
4AWS training workflows
5live validation
6production feedback

Platform work

Platform layers behind production computer vision.

01 · Labeling operations

In-house labeling tool and job manager

An internal annotation platform for turning raw field imagery into usable training data. The work covered the product surface, backend services, job dispatch, review states, schema evolution, and the handoff from labeled data into computer vision datasets.

Mode

Human-in-loop

Flow

Jobs to datasets

  • Annotation APIs, review queues, assignment logic, and worker-backed job processing.
  • Quality-control loops so labels could be reviewed, enriched, corrected, and reused.
  • Dataset handoff paths that made labeled frames easier to find, train on, and debug.
+annotation canvas
human in the loop
classes
crop
weed
soil
crop · 0.98weed · 0.87polygon
frame 0488 · 3 of 5 labeledtool · bounding box
job managerreview queue · 24
01

Raw frames

queued1,420 in
02

Label job

in progress6 workers
03

Quality review

in review2 reviewers
04

Dataset handoff

doneto computer vision
review rejects loop back to relabel · enrich · correct · reuse

02 · Machine learning operations

First shared AWS community cluster for machine learning operations

I helped turn machine learning infrastructure from scattered project machinery into a shared operating layer. That meant standing up the first community cluster on AWS and connecting orchestration, continuous integration, artifacts, releases, and data access into a workflow teams could repeat.

Cloud

AWS

Scope

Shared cluster

  • Kubeflow and Argo-style workflow orchestration for repeatable training and evaluation runs.
  • Continuous integration, artifact management, and deployment paths around fast-moving machine learning code.
  • A practical bridge between research notebooks, scheduled jobs, and production-facing pipelines.

Machine learning operations

Shared operating layer

Cloud

AWS

Scope

Shared cluster

Before · scattered

Perception

own machinery

Mapping

own machinery

Agronomy

own machinery

After · community cluster

One shared AWS cluster

first of its kind

AWS shared cluster

Repeatable training / evaluation pipeline

01

Commit

code

02

Build

test

03

Run

Argo

04

Train

job

05

Evaluate

metrics

06

Register

models

07

Deploy

serving

evaluation feeds the next run

Continuous integration

fast-moving code

Artifact registry

models + data

Data access

lake + tables

Deployment

serving paths

Platform services wired into every run

03 · Databricks lakehouse

Real-time change data capture and transformation on production-scale data

Databricks became one of the core places where raw operational data could become model-ready signal. I worked on the lakehouse patterns around change data capture, near-real-time transformations, enriched tables, notebook workflows, and repeatable debugging paths.

Pattern

Change data capture

Latency

Near real time

  • Change streams captured from operational systems and converted into usable analytical tables.
  • Real-time transformations that made fresh field and production events available for computer vision workflows.
  • Databricks notebooks and jobs for exploration, enrichment, validation, and incident debugging.

streaming lakehouse pipeline

near real time

sources

operational systems

operational database

row changes

field metadata

machine context

machine + model outputs

inference events

change data capture

stream
+insert
~update
delete

transform · medallion

ascending quality

01raw

landed change events

02refined

cleaned + deduplicated

03enriched

joined + feature-ready

Delta

enriched Delta tables → model-ready signal

consumers

computer vision workflows

fresh field + production events

notebooks + jobs

explore · enrich · validate · debug

operational changecapturedtransformedmodel-ready signal

04 · Developer experience

Internal developer kits and live testing features

The platform needed sharp tools, not just infrastructure. I built reusable developer kits and test harnesses that helped engineers query the lake, stage examples, exercise APIs, inspect images, and run live testing features against realistic data paths.

Users

Engineers

Surface

APIs + notebooks

  • Reusable clients for data lake queries, service connectors, image APIs, and database access.
  • Live testing features that let teams validate field-facing behavior without waiting for a full release loop.
  • Notebook-first utilities that made repeated computer vision investigations faster and less fragile.

dev kit session

query · stage · predict · live test

users

engineers

surface

APIs + notebooks

devkit · console
kit.lake.query("field_run_id = 4821")
1,284 frames
kit.images.stage(sample_set)
staged · 64
kit.client.api.predict(frame)
1 box · 0.94
kit.live_test.compare(prediction, label)
matchpass
image inspectorframe 0188
prediction 0.94label · crop
prediction vs label: match
query result4 rows · overlap 0.91
frame_01880.94
frame_01890.91
frame_01900.88

lake client

query

image API

predict

notebook

investigate

test harness

validate

05 · Field-to-model pipelines

Petabyte-scale computer vision data from test fields and production tractors

The core challenge was not just storing images. It was making huge streams from test fields and production spray tractors searchable, enrichable, debuggable, and useful for model development across the whole computer-vision lifecycle.

Data

Petabytes

Sources

Fields + tractors

  • Pipelines for ingesting, organizing, enriching, and querying field imagery and machine events.
  • Debugging loops that connected model failures back to frames, labels, metadata, and field context.
  • Data access patterns spanning lake queries, staged datasets, APIs, and machine-learning-ready feature views.
field → model lifecycle
petabyte-scale
sources · streamingimagery + machine events
test fields18 rigs
spray tractorsfleet · live
ingest · 4.2 million frames / day
data lake · organized + searchableenrich → query
raw frame
geolocationmachine eventcrop + weed labelsfield context

access patterns

lake

queries

staged

datasets

image

interfaces

feature

views

machine-learning-ready feature views
model developmenttrain · evaluate · ship
training
evaluation
ship
model failureresolves back to frames · labels · metadata · field context
debug loop · failures feed re-enrichment
ingestenrichquerytraindebugre-enrich