ReproNim · yarikoptic · Sep 26, 2025 · Sep 2, 2025 · Sep 3, 2025 · Sep 3, 2025
diff --git a/docs/source/execution.rst b/docs/source/execution.rst
@@ -53,10 +53,8 @@ necessary).
 Choosing an orchestrator
 ------------------------
 
-Before running a command, we need to decide on an orchestrator. The
-orchestrator is responsible for the first and third :ref:`tasks above
-<rr-tasks>`, preparing the remote and collecting the results. The complete
-set of orchestrators, accompanied by descriptions, can be seen by
+Orchestrators are responsible for preparing the remote and collecting the results.
+The complete set of orchestrators, accompanied by descriptions, can be seen by
 calling ``reproman run --list=orchestrators``.
 
 .. note::
@@ -66,29 +64,47 @@ calling ``reproman run --list=orchestrators``.
    only a limited set of functionality is available. If you are new to
    DataLad, consider reading the `DataLad handbook`_.
 
-The main orchestrator choices are ``datalad-pair``,
-``datalad-pair-run``, and ``datalad-local-run``. If the remote has
-DataLad available, you should go with one of the ``datalad-pair*`` orchestrators.
-These will sync your local dataset with a dataset on the remote machine
-(using `datalad push`_), creating one if it doesn't already exist
-(using `datalad create-sibling`_).
-
-``datalad-pair`` differs from the ``datalad-*-run`` orchestrators in the
-way it captures results. After execution has completed, ``datalad-pair``
-commits the result *on the remote* via DataLad. On fetch, it will pull
-that commit down with `datalad update`_. Outputs (specified via
-``--outputs`` or as a job parameter) are retrieved with `datalad get`_.
-
-``datalad-pair-run`` and ``datalad-local-run``, on the other hand,
-determine a list of output files based on modification times and
-packages these files in a tarball. (This approach is inspired by
-`datalad-htcondor`_.) On fetch, this tarball is downloaded locally and
-used to create a `datalad run`_ commit in the *local* repository.
-
-There is one more orchestrator, ``datalad-no-remote``, that is designed
-to work only with a local shell resource. It is similar to
-``datalad-pair``, except that the command is executed in the same
-directory from which ``reproman run`` is invoked.
+Choose the orchestrator based on your setup and needs:
+
+**For remote resources with DataLad (recommended):**
+
+- **``datalad-pair``** - Best for persistent remote datasets
+
+  - Creates and maintains DataLad datasets on the remote
+  - Commits results directly on the remote with full provenance
+  - Retrieves results using `datalad update`_ and `datalad get`_
+  - Marks completed jobs with git refs (refs/reproman/JOBID)
+
+- **``datalad-pair-run``** - Best for capturing runs in local dataset
+
+  - Prepares remote dataset like ``datalad-pair``
+  - Packages results in tarball based on file modification times  
+  - Creates a `datalad run`_ commit in your *local* repository
+  - Marks local commit with git ref (refs/reproman/JOBID)
+
+**For remote resources without DataLad:**
+
+- **``datalad-local-run``** - Remote execution, local DataLad integration
+
+  - Uses plain remote directory (no DataLad on remote required)
+  - Captures results as `datalad run`_ commit locally
+  - Good when remote lacks DataLad but you want local provenance
+
+- **``plain``** - Simple remote execution
+
+  - Basic file transfer using session.put() and session.get()
+  - No DataLad integration or provenance tracking
+  - Creates working directory named with job ID
+  - Sufficient for simple tasks but DataLad orchestrators recommended
+
+**For local execution:**
+
+- **``datalad-no-remote``** - Local dataset execution
+
+  - Executes in current local dataset directory
+  - Behaves like ``datalad-pair`` but stays local
+  - Available for local shell resources only
+  - Good for testing workflows locally
 
 Revisiting :ref:`our concrete example <rr-refex>` and assuming we have
 an SSH resource named "foo" in our inventory, here's how we could

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -5,6 +5,7 @@ ReproMan |---| tools for reproducible neuroimaging
    :maxdepth: 1
 
    overview
+   tutorial-ssh
    acknowledgements
 
 Concepts and technologies

diff --git a/docs/source/tutorial-ssh.rst b/docs/source/tutorial-ssh.rst
@@ -0,0 +1,193 @@
+.. _tutorial-ssh:
+
+Tutorial: SSH Resource Workflows
+*********************************
+
+This tutorial walks you through ReproMan workflows using SSH resources, from simple command execution to complex data analysis.
+We'll start with a basic hello-world example, then progress to processing neuroimaging data.
+
+This tutorial demonstrates ReproMan's power in creating reproducible, traceable computational workflows across SSH-accessible computing environments.
+
+Overview
+========
+
+We'll cover two workflows:
+
+**Part 1: Hello World Example**
+
+1. Create a ReproMan SSH resource
+2. Execute a simple command remotely
+3. Fetch and examine results
+
+**Part 2: Dataset Analysis Example**
+
+1. Set up a DataLad dataset with input data
+2. Execute MRIQC quality control analysis remotely
+3. Collect and examine results with full provenance
+
+Prerequisites
+=============
+
+For Part 1:
+
+- ReproMan installed on local machine (``pip install reproman``)
+- Access to a remote server via SSH
+
+For Part 2:
+
+- DataLad support (``pip install 'reproman[full]'``)
+- DataLad installed on remote server
+
+Part 1: Hello World Example
+============================
+
+Step 1: Create an SSH Resource
+-------------------------------
+
+First, let's add an SSH resource to ReproMan's inventory. Replace ``your-server.edu`` with your actual server::
+
+  reproman create myserver --resource-type ssh --backend-parameters host=your-server.edu
+
+Verify the resource was created::
+
+  reproman ls --refresh
+
+.. note::
+
+   The ``--refresh`` flag is needed to check the current status of resources. Without it, you'll only see cached status information.
+
+You should see output similar to::
+
+  RESOURCE NAME        TYPE                 ID                  STATUS
+  -------------        ----                 --                  ------
+  myserver             ssh                  1a23b456-789c-      ONLINE
+
+Step 2: Execute a Simple Command
+---------------------------------
+
+Let's start with a simple test to verify our setup works. Create a working directory and run a basic command::
+
+  mkdir -p hello-world
+  cd hello-world
+
+  reproman run --resource myserver \
+    --submitter local \
+    --orchestrator plain \
+    --output results \
+    sh -c 'mkdir -p results && echo "Hello from ReproMan on $(hostname)" > results/hello.txt'
+
+
+Step 3: Fetch Results
+---------------------
+
+The job will execute on the remote. To check status and fetch results::
+
+  # Check job status and get job ID
+  reproman jobs
+
+  # Fetch results for completed job (replace JOB_ID with actual ID)
+  reproman jobs JOB_ID
+
+When you run ``reproman jobs JOB_ID``, ReproMan will automatically:
+
+- Fetch the output files from the remote to your local working directory
+- Display job information and logs
+- Unregister the completed job
+
+You should now see the results locally::
+
+  cat results/hello.txt
+
+.. note::
+
+   ReproMan creates a working directory on the remote resource automatically. By default, it uses ``~/.reproman/run-root`` on the remote. You can verify the file exists there with ``reproman login myserver``.
+
+Part 2: Dataset Analysis Example
+=================================
+
+Now let's try a more realistic example with DataLad dataset management and neuroimaging analysis.
+
+Step 1: Set Up the Analysis Dataset
+------------------------------------
+
+Create a new DataLad dataset for our analysis::
+
+  # Create dataset for MRIQC quality control results
+  datalad create -d demo-mriqc -c text2git
+  cd demo-mriqc
+
+Install input data (using a demo BIDS dataset)::
+
+  # Install demo neuroimaging dataset
+  datalad install -d . -s https://github.com/ReproNim/ds000003-demo sourcedata/raw
+
+.. note::
+   This only installs the dataset structure - the actual data files are not
+   downloaded locally. DataLad will automatically fetch any data specified
+   by `--input` when the analysis runs.
+
+
+Set up working directory to be ignored::
+
+  datalad run -m "Ignore processing workdir" 'echo "workdir/" > .gitignore'
+
+Step 2: Execute Analysis with DataLad Integration
+-------------------------------------------------
+
+For full provenance tracking with DataLad::
+
+  reproman run --resource myserver \
+    --submitter local \
+    --orchestrator datalad-pair-run \
+    --input sourcedata/raw \
+    --output . \
+    bash -c 'podman run --rm -v "$(pwd):/work:rw" nipreps/mriqc:latest /work/sourcedata/raw /work/results participant group --participant-label 02'
+
+.. note::
+   The ``-v "$(pwd):/work:rw"`` part mounts your current directory into the
+   container at ``/work``, allowing the containerized software to access the
+   top level dataset.
+
+Step 3: Monitor Execution
+-------------------------
+
+ReproMan jobs run in detached mode by default. Monitor progress::
+
+  # List all jobs
+  reproman jobs
+
+  # Check specific job status (replace JOB_ID with actual ID)
+  reproman jobs JOB_ID
+
+  # Fetch completed job results
+  reproman jobs JOB_ID --fetch
+
+For attached execution (wait for completion)::
+
+  reproman run --resource myserver --follow \
+    [... rest of command ...]
+
+Step 4: Examine Results and Provenance
+--------------------------------------
+
+Once the job completes, examine what was captured::
+
+  # View the provenance record
+  git log --oneline -1
+
+  # Look at captured job information
+  ls .reproman/jobs/myserver/
+
+  # View job specification
+  cat .reproman/jobs/myserver/JOB_ID/spec.yaml
+
+  # Check MRIQC outputs
+  ls -la results/
+
+The DataLad orchestrators create rich provenance records::
+
+  # View the detailed run record
+  git show --stat
+
+  # See what files were modified/added
+  git show --name-status