Celery & Task Management
Celery & Task Management
Pulsar utilizes Celery as its distributed task queue to handle the asynchronous nature of network scanning. Because discovery tasks (such as subdomain enumeration and vulnerability scanning) can take anywhere from minutes to hours, they are offloaded to background workers to ensure the web interface remains responsive.
Background Scanning Architecture
When a scan is initiated through the Pulsar UI or API, the system creates a task in the queue. Celery workers pick up these tasks and execute the integrated tools (Amass, Nmap, ZMap, etc.) sequentially or in parallel based on your configuration.
The core logic is handled by two primary task types:
dispatch_scan: Orchestrates the overall scanning process for a target organization.run_scan: Executes specific scan modules and collectors against identified assets.
Managing Scans via the API
You can manage and monitor task execution through the REST API. Most task interactions are performed via the ScanInstance and AssetInstance endpoints.
Triggering a Scan
To start a new discovery task, send a POST request to the scan endpoint:
POST /api/scans/
Content-Type: application/json
{
"target": "example.com",
"policy": "full_discovery",
"comment": "Monthly footprint scan"
}
Monitoring Status
Each scan returns a unique ID. Use this ID to poll the status of the background worker:
GET /api/scans/{id}/
Response Fields:
| Field | Type | Description |
| :--- | :--- | :--- |
| status | string | Current state (PENDING, STARTED, SUCCESS, FAILURE). |
| progress | float | Percentage completion based on the current policy. |
| started_at | datetime | Timestamp when the worker picked up the task. |
| task_id | uuid | The internal Celery task ID for troubleshooting. |
Background Worker Configuration
To process discovery jobs, you must have the Celery worker and the Celery Beat scheduler running. The scheduler is responsible for "Scheduling & Notifications" features.
Starting a Worker
In your installation directory, start a worker to begin processing the task queue:
celery -A portal worker -l info
Starting the Scheduler
For recurring scans and automated cleanup jobs, start the beat service:
celery -A portal beat -l info
Scan Optimization & Policies
Task execution is governed by Scan Policies. These allow you to manage how background workers handle long-running discovery jobs:
- Concurrency Control: You can scale the number of Celery workers to handle multiple organizations simultaneously.
- Resource Limits: Modify scan policies in the Pulsar UI to limit the intensity of tools like Nmap or ZMap, preventing your workers from being blacklisted by target infrastructure.
- Sandbox Execution: Pulsar uses a sandbox utility for custom scanner extensions to ensure that long-running third-party scripts do not hang the main worker process.
Troubleshooting Tasks
If a scan appears stuck in a "PENDING" state:
- Check Redis/Broker: Ensure your message broker (Redis) is accessible by both the Django web container and the Celery worker.
- Worker Logs: Monitor the worker output for tool-specific errors (e.g., Nmap requiring root privileges or Amass timing out).
- Purging Tasks: If the queue is backed up with old discovery jobs, you can clear it using:
celery -A portal purge