test: add e2e tests for Crawlee crawlers as Apify Actors#784
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #784 +/- ##
==========================================
- Coverage 82.06% 81.57% -0.49%
==========================================
Files 46 46
Lines 2698 2698
==========================================
- Hits 2214 2201 -13
- Misses 484 497 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
df6a90a to
3f14ef3
Compare
|
@janbuchar, @Mantisus - not necessary requesting a review, up to you guys, mostly just to keep you informed about the SDK test suite improvements. |
Add 6 e2e tests (one per crawler type) verifying that each Crawlee crawler works correctly when deployed as an Actor on the Apify platform. Each test exercises link discovery, data extraction (push_data), and KVS storage against a local 5-page e-commerce test server. Crawlers covered: BasicCrawler, HttpCrawler, BeautifulSoupCrawler, ParselCrawler, PlaywrightCrawler, AdaptivePlaywrightCrawler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Playwright browser process uses ~244MB at startup, exceeding the 256MB default. Both PlaywrightCrawler and AdaptivePlaywrightCrawler tests timed out due to memory pressure. Add memory_mbytes parameter to make_actor and set it to 1024MB for Playwright tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…wler e2e tests Move Actor source code from triple-quoted string constants into standalone files under actor_source/, so they benefit from syntax highlighting, linting, and type-checking. Load them at runtime via Path.read_text() helpers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These functions are imported across modules, so they are part of the test package's public API and shouldn't use the private convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…traints The env var was never set anywhere. Use sys.version_info directly. Also drop version constraints from additional_requirements in e2e tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r e2e tests Consolidate the two separate server.py files (actor_source_base and test_crawlee_crawlers/actor_source) into a single base server with a category-based depth structure and an infinite /deep/N chain. Add max_crawl_depth=2 to all crawler constructors to test depth limiting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add direct product links to the base server homepage so Scrapy spiders (which look for /products/ links on the start page) work without their own server.py. Now all e2e tests share a single server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update test_actor_on_platform_max_crawl_depth to use the /deep/N URL pattern from the shared server instead of the old infinite pagination URLs that no longer exist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9f3cb17 to
5277d25
Compare
|
Maybe I'll be able to check this out later in more detail, for now I'll just share one thing - we should collocate the "Test Actors" with their testcases, instead of having two big folders, actors and tests. |
I absolutely agree. However, since this PR is only adding new E2E tests to the existing setup, I'd prefer not to change the structure here. Let's handle this in a dedicated PR and discuss the structure beforehand. |
Summary
enqueue_links/add_requests), data extraction (push_data), and KVS storage (Actor.set_value).conftest.py: ASGI test server, Playwright Dockerfile template, product data expectations, and a_verify_crawler_resultshelper that checks run status, dataset contents, and KVS records.Motivation
Issue
Test plan