feat: add timestamp for scraper content by aturret · Pull Request #84 · aturret/FastFetchBot

aturret · 2026-05-24T03:33:20Z

Summary by CodeRabbit

New Features
- Added publication timestamp tracking for content from multiple platforms (Instagram, Twitter, Reddit, Weibo, Bluesky, Threads, Xiaohongshu, Zhihu, and general web sources).
- Improved date metadata consistency across all scrapers.
Tests
- Added test coverage for timestamp parsing and serialization.

coderabbitai · 2026-05-24T03:33:33Z

Warning

Review limit reached

@aturret, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 47 minutes and 49 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84d76715-6747-4e90-a0c4-b6d88132088b

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea5363 and ff68610.

📒 Files selected for processing (2)

packages/shared/fastfetchbot_shared/database/mongodb/cache.py
tests/unit/database/mongodb/test_cache.py

📝 Walkthrough

Walkthrough

This PR adds end-to-end timestamp propagation across the FastFetchBot metadata and scraper ecosystem. It introduces a timestamp field to the core MetadataItem model, extends MongoDB persistence to store timestamps as published_timestamp, and updates 15+ platform scrapers (Twitter, Weibo, Instagram, Reddit, Threads, Bluesky, Zhihu, Xiaohongshu, Douban, Wechat, Inoreader, YouTube, Bilibili) to extract and normalize platform-specific timestamp formats.

Changes

End-to-end timestamp propagation

Layer / File(s)	Summary
Core timestamp model and utilities `packages/shared/fastfetchbot_shared/models/metadata_item.py`, `packages/shared/fastfetchbot_shared/utils/parse.py`	Adds `MetadataItem.timestamp` field and `from_optional_int` parser to validate positive integer timestamps; refactors `unix_timestamp_to_utc` to use a reusable `BEIJING_TZ` timezone constant for consistent timestamp formatting.
Metadata storage and cache transformation `packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`, `packages/shared/fastfetchbot_shared/database/mongodb/cache.py`, `packages/shared/fastfetchbot_shared/services/scrapers/common.py`	Adds `published_timestamp` field to MongoDB `Metadata` model; `save_metadata` copies and transforms input dicts to rename `timestamp` → `published_timestamp`; cache hits normalize the response by mapping `published_timestamp` back to `timestamp`.
Scraper framework integration `packages/shared/fastfetchbot_shared/services/scrapers/general/base.py`, `packages/shared/fastfetchbot_shared/services/scrapers/general/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl.py`, `packages/shared/fastfetchbot_shared/services/scrapers/general/zyte.py`	`BaseGeneralDataProcessor._build_item_data` accepts optional `timestamp` parameter with validation (int > 0); `GeneralItem.from_dict` propagates `MetadataItem.timestamp`; Zyte scraper extracts publication date via ISO parser helper.
Video and social media scrapers `packages/shared/fastfetchbot_shared/services/file_export/video_download.py`, `packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py`, `packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/reddit/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/bluesky/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/bluesky/scraper.py`	Each scraper initializes `self.timestamp`, extracts platform timestamps (Twitter via `parsedate_to_datetime`; Weibo/Bluesky via ISO parser with UTC default; Reddit via `created_utc` Unix epoch; Instagram via `taken_at` with milliseconds downscaling; Threads via `published_on`; video scrapers via YYYYMMDD or int validation), and stores as instance attribute.
Chinese content platform scrapers `packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/__init__.py`	Zhihu extracts from `created` field across API and JSON scraping paths; Xiaohongshu parses `time`/`last_update_time` with milliseconds-vs-seconds detection and uses the parsed values for `created`/`updated` computation via `unix_timestamp_to_utc`.
Miscellaneous platform scrapers `packages/shared/fastfetchbot_shared/services/scrapers/douban/__init__.py`, `packages/shared/fastfetchbot_shared/services/scrapers/wechat/__init__.py`, `apps/api/src/services/inoreader/__init__.py`	Douban, Wechat, and Inoreader initialize `self.timestamp = None` and skip processing null metadata values (Wechat) or parse from constructor data (Inoreader).
Test coverage `tests/unit/database/mongodb/test_cache.py`, `tests/unit/scrapers/test_common_cache.py`, `tests/unit/scrapers/test_wechat.py`, `tests/unit/test_published_timestamp.py`	Verifies `save_metadata` maps `timestamp` → `published_timestamp` correctly; cache hits return `timestamp` from stored `published_timestamp`; `MetadataItem.to_dict()` serializes numeric timestamps but rejects string datetime values; confirms `timestamp` absence in Wechat output when not provided.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The PR introduces a common timestamp field across 25+ files with platform-specific extraction logic for 15+ social/content platforms. While individual parsers follow similar patterns (validate input → normalize to Unix seconds → store), the heterogeneity of timestamp formats (ISO strings, RFC dates, milliseconds, Unix epochs) and multiple scraper implementations across independent platforms require careful review of each platform's parsing rules, edge cases (null/invalid inputs), and propagation through the metadata pipeline. The storage layer transformation (timestamp ↔ published_timestamp) adds conceptual complexity despite simple code.

Possibly related PRs

aturret/FastFetchBot#73: Extends MongoDB Metadata model and save_metadata function—this PR builds on that foundation to map timestamps during persistence.
aturret/FastFetchBot#49: Refactors BaseGeneralDataProcessor._build_item_data in the general scraper stack—this PR adds the timestamp parameter and conditional field validation to that signature.

Poem

🐰 Through platforms vast, from tweets to posts so bright,
We gather timestamps, normalize to light—
Zhihu, Instagram, Twitter's racing beat,
Unix seconds flowing, all aligned and neat!
From storage vaults to scrapers ever keen,
Time flows through the data—pristine and serene. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add timestamp for scraper content' accurately summarizes the main change—adding timestamp extraction and propagation across multiple scraper modules throughout the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch timestamp-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-24T03:34:19Z

Codecov Report

❌ Patch coverage is 80.88235% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...hbot_shared/services/file_export/video_download.py	76.19%	5 Missing ⚠️
...tchbot_shared/services/scrapers/bluesky/scraper.py	63.63%	4 Missing ⚠️
...fetchbot_shared/services/scrapers/weibo/scraper.py	69.23%	4 Missing ⚠️

@@           Coverage Diff           @@
##             main      #84   +/-   ##
=======================================
  Coverage        ?   80.96%           
=======================================
  Files           ?       66           
  Lines           ?     3636           
  Branches        ?        0           
=======================================
  Hits            ?     2944           
  Misses          ?      692           
  Partials        ?        0

Files with missing lines	Coverage Δ
...ared/fastfetchbot_shared/database/mongodb/cache.py	`100.00% <100.00%> (ø)`
...etchbot_shared/database/mongodb/models/metadata.py	`97.77% <100.00%> (ø)`
...shared/fastfetchbot_shared/models/metadata_item.py	`97.67% <100.00%> (ø)`
...ed/fastfetchbot_shared/services/scrapers/common.py	`95.31% <100.00%> (ø)`
...tfetchbot_shared/services/scrapers/general/base.py	`100.00% <100.00%> (ø)`
...hbot_shared/services/scrapers/general/firecrawl.py	`100.00% <100.00%> (ø)`
packages/shared/fastfetchbot_shared/utils/parse.py	`46.30% <100.00%> (ø)`
...tchbot_shared/services/scrapers/bluesky/scraper.py	`97.33% <63.63%> (ø)`
...fetchbot_shared/services/scrapers/weibo/scraper.py	`98.73% <69.23%> (ø)`
...hbot_shared/services/file_export/video_download.py	`83.54% <76.19%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py`:
- Line 77: Replace the unsafe bypass of Pydantic validation by changing the call
to Metadata.model_construct(**document_data) to use the normal constructor
Metadata(**document_data) in the cache logic; update the surrounding code in the
function that builds persisted metadata (where doc is created) to import and
catch pydantic.ValidationError (or ValueError) so you can log/handle invalid
documents instead of allowing malformed payloads to be persisted.

In `@packages/shared/fastfetchbot_shared/models/metadata_item.py`:
- Line 122: The serializer is currently hardcoding "telegraph_url": "" which
drops stored telegraph links; update the serialization in the MetadataItem
(likely the to_dict/serialize method in metadata_item.py) to include the actual
attribute (e.g. self.telegraph_url or getattr(self, "telegraph_url", ""))
instead of the empty string so existing telegraph links are preserved in output
payloads and downstream storage/response.

In `@packages/shared/fastfetchbot_shared/services/scrapers/bluesky/scraper.py`:
- Around line 197-206: The helper _parse_bluesky_created_at currently swallows
parse errors and returns None without any trace; update it to log the failure
before returning so exceptions aren't silent: catch the (TypeError, ValueError)
in _parse_bluesky_created_at and call logger.exception (or logger.debug with the
exception) including the input created_at and context (e.g., "failed to parse
created_at") before returning None, ensuring the module has an
imported/initialized logger variable referenced by name in the change.

In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py`:
- Line 93: The current line sets result["timestamp"] using
result.pop("published_timestamp", None), which overwrites legacy records'
existing "timestamp" with None; change the logic in the block that manipulates
the result dict so that if "published_timestamp" exists you move it into
"timestamp" (remove the old key) but if it does not exist you leave any existing
result["timestamp"] intact (or leave it absent) — i.e., prefer
result["published_timestamp"] when present, otherwise keep result["timestamp"];
adjust the code around the result dict (the use of
result.pop("published_timestamp", None) and assignment to result["timestamp"])
accordingly.

In `@packages/shared/fastfetchbot_shared/services/scrapers/general/zyte.py`:
- Around line 90-96: The ValueError in _parse_zyte_date_published is being
swallowed; modify the except block to log the failure before returning None—use
the module logger (e.g., logger.exception or logger.error with exception info)
and include the raw value being parsed so upstream bad timestamps are visible;
if a logger isn't present, import or create one (e.g., getLogger(__name__));
keep the current behavior of returning None after logging.

In `@packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py`:
- Around line 284-295: The function _parse_instagram_timestamp currently
swallows parse errors and invalid values; update it to use a module-level logger
(e.g., logger = logging.getLogger(__name__)) and log failures instead of
silently returning None: call logger.exception or logger.warning inside the
except (TypeError, ValueError) block including the raw value, and also emit a
logger.warning when timestamp <= 0 (include the parsed timestamp and original
value); keep returning None for invalid cases but ensure all failure paths are
logged with clear context identifying _parse_instagram_timestamp and the
offending value.

In `@packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py`:
- Around line 336-345: The helper _parse_twitter_created_at silently swallows
parsing exceptions and returns None; update it to log the failure and document
the fallback: catch the (TypeError, ValueError, IndexError) from
parsedate_to_datetime, call the module logger (e.g., logger.exception or
logger.error with exc_info=True) with a clear message including the raw
created_at string, and then return None (or re-raise if you prefer explicit
failure), and add/update the _parse_twitter_created_at docstring to state that
parsing errors are logged and None is returned as the documented fallback. Use
the function name _parse_twitter_created_at and the parsedate_to_datetime call
to locate where to add logging.

In `@packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py`:
- Around line 534-543: The _parse_weibo_created_at helper currently swallows
parsing exceptions and returns None silently; update it to log parsing failures
and document the fallback behavior: import or use the module logger (e.g.,
logger) inside _parse_weibo_created_at, catch the (TypeError, ValueError,
IndexError) exception and call logger.exception(...) or logger.warning(...) with
the created_at value and error details, and add a short comment stating that
returning None is the documented fallback when parsing fails; ensure the
function still returns None after logging to preserve current behavior.

In `@tests/unit/scrapers/test_wechat.py`:
- Line 60: The test currently contradicts the PR intent by asserting the
injected fixture `ct` yields no timestamp; update the assertion in
tests/unit/scrapers/test_wechat.py (the failing line that reads assert
"timestamp" not in result) to instead assert that result contains "timestamp"
and that its value equals the normalized timestamp derived from the fixture `ct`
(use the same normalization routine the Wechat scraper uses — e.g., the
scraper's timestamp parsing/normalization helper such as
normalize_timestamp/parse_ct or the Wechat scraper method that produces
ISO8601/unix output) so the test verifies presence and correct normalized value
rather than absence.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 715934f5-381b-4d78-aebb-13cb30345c43

📥 Commits

Reviewing files that changed from the base of the PR and between 381b7ff and 6ea5363.

📒 Files selected for processing (27)

apps/api/src/services/inoreader/__init__.py
packages/shared/fastfetchbot_shared/database/mongodb/cache.py
packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py
packages/shared/fastfetchbot_shared/models/metadata_item.py
packages/shared/fastfetchbot_shared/services/file_export/video_download.py
packages/shared/fastfetchbot_shared/services/scrapers/bluesky/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/bluesky/scraper.py
packages/shared/fastfetchbot_shared/services/scrapers/common.py
packages/shared/fastfetchbot_shared/services/scrapers/douban/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/general/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/general/base.py
packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl.py
packages/shared/fastfetchbot_shared/services/scrapers/general/zyte.py
packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/reddit/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/wechat/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/weibo/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py
packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py
packages/shared/fastfetchbot_shared/utils/parse.py
tests/unit/database/mongodb/test_cache.py
tests/unit/scrapers/test_common_cache.py
tests/unit/scrapers/test_wechat.py
tests/unit/test_published_timestamp.py

coderabbitai · 2026-05-24T03:40:36Z

-            "telegraph_url": "", "content": from_str(self.content),
-            "text": from_str(self.text),
+            "url": from_str(getattr(self, "url", "")),
+            "telegraph_url": "",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve telegraph_url during serialization.

Line 122 hardcodes telegraph_url to an empty string, which drops existing telegraph links and causes data loss in downstream storage/response payloads.

💡 Proposed fix

- "telegraph_url": "", + "telegraph_url": from_str(getattr(self, "telegraph_url", "")),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/fastfetchbot_shared/models/metadata_item.py` at line 122, The serializer is currently hardcoding "telegraph_url": "" which drops stored telegraph links; update the serialization in the MetadataItem (likely the to_dict/serialize method in metadata_item.py) to include the actual attribute (e.g. self.telegraph_url or getattr(self, "telegraph_url", "")) instead of the empty string so existing telegraph links are preserved in output payloads and downstream storage/response.

coderabbitai · 2026-05-24T03:40:36Z

+def _parse_bluesky_created_at(created_at: str | None) -> int | None:
+    if not created_at:
+        return None
+    try:
+        parsed = datetime.datetime.fromisoformat(created_at.replace("Z", "+00:00"))
+    except (TypeError, ValueError):
+        return None
+    if parsed.tzinfo is None:
+        parsed = parsed.replace(tzinfo=datetime.timezone.utc)
+    return int(parsed.timestamp())


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid silent parse failures in timestamp helper.

This helper swallows parse exceptions and returns None without logging or documented handling. Add at least debug/exception logging (or explicitly document this fallback behavior).

As per coding guidelines, "Never silently swallow exceptions—always either re-raise, log with logger.exception(), or explicitly handle and document the behavior. Do not return None or empty data on failure without logging".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/fastfetchbot_shared/services/scrapers/bluesky/scraper.py` around lines 197 - 206, The helper _parse_bluesky_created_at currently swallows parse errors and returns None without any trace; update it to log the failure before returning so exceptions aren't silent: catch the (TypeError, ValueError) in _parse_bluesky_created_at and call logger.exception (or logger.debug with the exception) including the input created_at and context (e.g., "failed to parse created_at") before returning None, ensuring the module has an imported/initialized logger variable referenced by name in the change.

coderabbitai · 2026-05-24T03:40:36Z

                if cached is not None:
                    logger.info("Cache hit, returning cached metadata")
                    result = cached.model_dump(mode="json", exclude={"id"})
+                    result["timestamp"] = result.pop("published_timestamp", None)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep legacy cache timestamps when published_timestamp is missing.

Line 93 sets timestamp to None for older cached records that only contain timestamp, which drops usable data.

💡 Proposed fix

- result["timestamp"] = result.pop("published_timestamp", None) + result["timestamp"] = result.pop( + "published_timestamp", result.get("timestamp") + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py` at line 93, The current line sets result["timestamp"] using result.pop("published_timestamp", None), which overwrites legacy records' existing "timestamp" with None; change the logic in the block that manipulates the result dict so that if "published_timestamp" exists you move it into "timestamp" (remove the old key) but if it does not exist you leave any existing result["timestamp"] intact (or leave it absent) — i.e., prefer result["published_timestamp"] when present, otherwise keep result["timestamp"]; adjust the code around the result dict (the use of result.pop("published_timestamp", None) and assignment to result["timestamp"]) accordingly.

coderabbitai · 2026-05-24T03:40:36Z

+def _parse_instagram_timestamp(value: Any) -> int | None:
+    if value in (None, ""):
+        return None
+    try:
+        timestamp = int(value)
+    except (TypeError, ValueError):
+        return None
+    if timestamp <= 0:
+        return None
+    if timestamp > 10_000_000_000:
+        timestamp //= 1000
+    return timestamp


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make invalid timestamp handling observable.

_parse_instagram_timestamp returns None on parse failures without logging or explicit documented behavior. Please add logging for failed conversions (or document silent fallback intent).

As per coding guidelines, "Never silently swallow exceptions—always either re-raise, log with logger.exception(), or explicitly handle and document the behavior. Do not return None or empty data on failure without logging".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py` around lines 284 - 295, The function _parse_instagram_timestamp currently swallows parse errors and invalid values; update it to use a module-level logger (e.g., logger = logging.getLogger(__name__)) and log failures instead of silently returning None: call logger.exception or logger.warning inside the except (TypeError, ValueError) block including the raw value, and also emit a logger.warning when timestamp <= 0 (include the parsed timestamp and original value); keep returning None for invalid cases but ensure all failure paths are logged with clear context identifying _parse_instagram_timestamp and the offending value.

coderabbitai · 2026-05-24T03:40:36Z

+def _parse_twitter_created_at(created_at: str | None) -> int | None:
+    if not created_at:
+        return None
+    try:
+        parsed = parsedate_to_datetime(created_at)
+    except (TypeError, ValueError, IndexError):
+        return None
+    if parsed.tzinfo is None:
+        parsed = parsed.replace(tzinfo=datetime.timezone.utc)
+    return int(parsed.timestamp())


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t silently suppress Twitter date parse errors.

Line 341-342 returns None on parse failure without logging or explicit documented fallback behavior. Please log the failure path (or document expected silent fallback).

As per coding guidelines, "Never silently swallow exceptions—always either re-raise, log with logger.exception(), or explicitly handle and document the behavior. Do not return None or empty data on failure without logging".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py` around lines 336 - 345, The helper _parse_twitter_created_at silently swallows parsing exceptions and returns None; update it to log the failure and document the fallback: catch the (TypeError, ValueError, IndexError) from parsedate_to_datetime, call the module logger (e.g., logger.exception or logger.error with exc_info=True) with a clear message including the raw created_at string, and then return None (or re-raise if you prefer explicit failure), and add/update the _parse_twitter_created_at docstring to state that parsing errors are logged and None is returned as the documented fallback. Use the function name _parse_twitter_created_at and the parsedate_to_datetime call to locate where to add logging.

coderabbitai · 2026-05-24T03:40:37Z

+def _parse_weibo_created_at(created_at: str | None) -> int | None:
+    if not created_at:
+        return None
+    try:
+        parsed = parsedate_to_datetime(created_at)
+    except (TypeError, ValueError, IndexError):
+        return None
+    if parsed.tzinfo is None:
+        parsed = parsed.replace(tzinfo=datetime.timezone.utc)
+    return int(parsed.timestamp())


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add logging or documented fallback for Weibo timestamp parse failures.

The helper currently suppresses parse exceptions and returns None without logging, which makes failures opaque.

As per coding guidelines, "Never silently swallow exceptions—always either re-raise, log with logger.exception(), or explicitly handle and document the behavior. Do not return None or empty data on failure without logging".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py` around lines 534 - 543, The _parse_weibo_created_at helper currently swallows parsing exceptions and returns None silently; update it to log parsing failures and document the fallback behavior: import or use the module logger (e.g., logger) inside _parse_weibo_created_at, catch the (TypeError, ValueError, IndexError) exception and call logger.exception(...) or logger.warning(...) with the created_at value and error details, and add a short comment stating that returning None is the documented fallback when parsing fails; ensure the function still returns None after logging to preserve current behavior.

coderabbitai · 2026-05-24T03:40:37Z

        assert result["title"] == "Test Title"
        assert result["author"] == "Test Author"
        assert "Test content paragraph" in result["content"]
+        assert "timestamp" not in result


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assertion contradicts the timestamp propagation goal for Wechat.

This test now injects ct in the fixture but asserts timestamp must be absent, which enforces the opposite of the PR objective for Wechat timestamp support. Update the expectation to assert normalized timestamp output instead.

Suggested test fix

- assert "timestamp" not in result + assert result["timestamp"] == 1704067200

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "timestamp" not in result

assert result["timestamp"] == 1704067200

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/scrapers/test_wechat.py` at line 60, The test currently contradicts the PR intent by asserting the injected fixture `ct` yields no timestamp; update the assertion in tests/unit/scrapers/test_wechat.py (the failing line that reads assert "timestamp" not in result) to instead assert that result contains "timestamp" and that its value equals the normalized timestamp derived from the fixture `ct` (use the same normalization routine the Wechat scraper uses — e.g., the scraper's timestamp parsing/normalization helper such as normalize_timestamp/parse_ct or the Wechat scraper method that produces ISO8601/unix output) so the test verifies presence and correct normalized value rather than absence.

feat: add timestamp for scraper content

6ea5363

Update zyte.py

99eaedd

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

fix mongodb cache

ff68610

aturret merged commit 6a31aa2 into main May 24, 2026
5 checks passed

aturret deleted the timestamp-update branch May 24, 2026 13:56

	assert "timestamp" not in result
	assert result["timestamp"] == 1704067200

Conversation

aturret commented May 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aturret commented May 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading

codecov Bot commented May 24, 2026 •

edited

Loading