·18 min

Ultimate Guide: Export Your Reddit Data to Markdown Using Python & PRAW API

Complete tutorial on exporting Reddit submissions, comments, and saved posts to Markdown format with a powerful Python script. Includes media downloads, retry mechanisms, and full conversation threads. Perfect for data analysis, backup, or content migration.

DK

Daniel Kliewer

Author, Sovereign AI

pythonredditprawdata-exportmarkdownapiautomationbackupdata-analysis
Sovereign AI book cover

From the Book

This is from Sovereign AI: Building Local-First Intelligent Systems.

Get the Book — $88
Ultimate Guide: Export Your Reddit Data to Markdown Using Python & PRAW API

Ultimate Guide: How to Export Your Reddit Data to Markdown Using Python & PRAW API

Are you tired of scattered Reddit posts and comments lost in the digital void? Do you want a comprehensive backup of your Reddit activity for analysis, migration, or archiving? This comprehensive guide will show you how to export your entire Reddit history—including submissions, comments, saved posts, and even media files—into clean, structured Markdown files using a powerful Python script.

Whether you're a data enthusiast looking to analyze your online behavior, a content creator migrating posts, or simply someone who wants a searchable backup of their digital footprint, this tutorial provides everything you need. The script handles rate limits, resumes interrupted downloads, and preserves full conversation threads with complete parent/child relationships.

Why Export Reddit Data to Markdown?

Before diving into the technical details, let's explore why you might want to export your Reddit data:

Comprehensive Backup & Archival

Reddit is volatile—posts get deleted, accounts get banned, and threads disappear. Having a local Markdown archive ensures you never lose access to your contributions or valuable discussions.

Data Analysis & Personal Insights

With your data in Markdown format, you can easily analyze patterns in your posting behavior, most discussed topics, or even use text analysis tools to gain insights into your online personality.

Content Migration

Moving from Reddit to your own blog? This script exports everything in a format that's ready for platforms like WordPress, Hugo, or Jekyll.

Enhanced Searchability

Unlike Reddit's search, your local Markdown files can be indexed with tools like Elasticsearch or even searched with simple grep commands.

Academic or Research Purposes

Researchers often need to analyze large datasets—having Reddit threads in Markdown format makes text processing dramatically easier.

Prerequisites & Requirements

Before we start, ensure you have:

  • Python 3.7+ installed on your system
  • A Reddit account with API access configured
  • Basic familiarity with command-line operations
  • Sufficient disk space for your export (depends on how much you've posted/saved)

The script uses several Python libraries that we'll install later, including PRAW for Reddit API access, markdownify for HTML-to-Markdown conversion, and tqdm for progress tracking.

Step 1: Setting Up Reddit API Access

To access Reddit's API (which this script relies on), you'll need to create an application through Reddit's app interface. This is free and takes about 2 minutes.

First create a praw.ini file and save the following code along with the values. You can find the values you need in the reddit app you created. Here is where you can configure the app: Reddit App Configuration

ini
1[DEFAULT]
2client_id=
3client_secret=
4username=
5password=
6user_agent=reddit-export-script by /u/

Next I create a python script and save the following code.

python
1#!/usr/bin/env python3
2"""
3reddit_export.py
4
5Export Reddit user content to markdown with:
6 - automatic retry/backoff on 429 (uses Retry-After if provided)
7 - save & resume progress via state.json
8 - full parent chain + child replies for comments
9 - concurrent media downloads
10 - index.json and index.csv
11
12Dependencies:
13 pip install praw markdownify python-frontmatter requests tqdm
14"""
15
16import argparse
17import csv
18import json
19import logging
20import os
21import re
22import sys
23import tempfile
24import time
25from concurrent.futures import ThreadPoolExecutor, as_completed
26from datetime import datetime, timezone
27from pathlib import Path
28from typing import Dict, List, Tuple, Any, Optional
29
30import frontmatter
31import requests
32from markdownify import markdownify as md
33from tqdm import tqdm
34
35import praw
36import prawcore
37from praw.models import Submission, Comment
38
39# ---------- Logging ----------
40logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s: %(message)s")
41LOG = logging.getLogger("reddit_export")
42
43# ---------- Utilities ----------
44def safe_slug(s: str, maxlen: int = 100) -> str:
45 s = (s or "").strip()
46 s = re.sub(r'[\s/\\]+', '-', s)
47 s = re.sub(r'[^A-Za-z0-9_\-\.]+', '', s)
48 return s[:maxlen].strip('-')
49
50def ts_to_iso(ts: float) -> str:
51 return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat()
52
53def ensure_dir(p: Path):
54 p.mkdir(parents=True, exist_ok=True)
55
56def atomic_write_json(path: Path, obj: Any):
57 with tempfile.NamedTemporaryFile(mode="w", suffix=".json", dir=path.parent, delete=False) as fh:
58 json.dump(obj, fh, indent=2)
59 temp_path = Path(fh.name)
60 try:
61 temp_path.replace(path)
62 except Exception as e:
63 LOG.warning("Failed to atomically replace %s: %s. Writing directly.", path, e)
64 with path.open("w", encoding="utf-8") as fh:
65 json.dump(obj, fh, indent=2)
66 temp_path.unlink(missing_ok=True)
67
68# ---------- Retry decorator ----------
69def retry_on_rate_limit(max_attempts: int = 6, base_sleep: float = 2.0):
70 def decorator(fn):
71 def wrapper(*args, **kwargs):
72 attempt = 0
73 while True:
74 try:
75 return fn(*args, **kwargs)
76 except prawcore.exceptions.TooManyRequests as e:
77 attempt += 1
78 if attempt > max_attempts:
79 LOG.error("Max retry attempts reached for %s", fn.__name__)
80 raise
81 retry_after = None
82 try:
83 resp = getattr(e, "response", None)
84 if resp and hasattr(resp, "headers"):
85 retry_after = resp.headers.get("Retry-After") or resp.headers.get("retry-after")
86 except Exception:
87 retry_after = None
88 wait = float(retry_after) if retry_after else base_sleep * (2 ** (attempt - 1))
89 LOG.warning("Rate limited on %s: sleeping %s seconds (attempt %d/%d)", fn.__name__, wait, attempt, max_attempts)
90 time.sleep(wait)
91 except prawcore.exceptions.RequestException as e:
92 attempt += 1
93 if attempt > max_attempts:
94 LOG.exception("Network error and max attempts reached for %s", fn.__name__)
95 raise
96 wait = base_sleep * (2 ** (attempt - 1))
97 LOG.warning("RequestException in %s: %s — sleeping %s seconds (attempt %d/%d)", fn.__name__, e, wait, attempt, max_attempts)
98 time.sleep(wait)
99 return wrapper
100 return decorator
101
102# ---------- Media download ----------
103def download_file(session: requests.Session, url: str, dest: Path, timeout: int = 30) -> Tuple[str, str, bool]:
104 try:
105 r = session.get(url, stream=True, timeout=timeout)
106 r.raise_for_status()
107 ensure_dir(dest.parent)
108 with open(dest, "wb") as fh:
109 for chunk in r.iter_content(1024 * 64):
110 if chunk:
111 fh.write(chunk)
112 return (url, str(dest), True)
113 except Exception as e:
114 LOG.debug("Failed to download %s -> %s: %s", url, dest, e)
115 return (url, str(dest), False)
116
117# ---------- Markdown builders ----------
118def make_submission_markdown(item: Submission) -> Tuple[Dict, str, List[Tuple[str, Path]]]:
119 fm = {
120 "id": item.id,
121 "type": "submission",
122 "title": item.title,
123 "subreddit": str(item.subreddit),
124 "author": str(item.author) if item.author else None,
125 "created_utc": ts_to_iso(item.created_utc),
126 "score": item.score,
127 "num_comments": item.num_comments,
128 "permalink": f"https://reddit.com{item.permalink}",
129 "url": item.url,
130 "over_18": item.over_18,
131 "is_self": item.is_self,
132 "distinguished": item.distinguished,
133 "stickied": item.stickied,
134 "edited": item.edited,
135 }
136 body_md = ""
137 media_tasks: List[Tuple[str, Path]] = []
138
139 if item.is_self:
140 body_md = md(getattr(item, "selftext_html", None) or item.selftext or "")
141 else:
142 body_md = f"[External URL]({item.url})\n\n"
143 p = getattr(item, "preview", None)
144 if p and "images" in p:
145 for idx, im in enumerate(p["images"]):
146 src = im.get("source", {}).get("url")
147 if src:
148 src = src.replace("&", "&")
149 body_md += f"![preview-{idx}]({src})\n\n"
150 ext = Path(src.split("?")[0]).suffix or ".jpg"
151 dest = Path("media") / f"sub_{item.id}" / f"{item.id}_preview_{idx}{ext}"
152 media_tasks.append((src, dest, {}))
153
154 # gallery support
155 if getattr(item, "is_gallery", False):
156 md_meta = getattr(item, "media_metadata", {}) or {}
157 gallery = []
158 for g in getattr(item, "gallery_data", {}).get("items", []):
159 media_id = g.get("media_id")
160 meta = md_meta.get(media_id, {})
161 url = None
162 if "s" in meta and "u" in meta["s"]:
163 url = meta["s"]["u"]
164 elif "p" in meta and meta["p"]:
165 url = meta["p"][-1].get("u")
166 if url:
167 url = url.replace("&", "&")
168 gallery.append(url)
169 for idx, src in enumerate(gallery):
170 body_md += f"![gallery-{idx}]({src})\n\n"
171 ext = Path(src.split("?")[0]).suffix or ".jpg"
172 dest = Path("media") / f"sub_{item.id}" / f"{item.id}_gallery_{idx}{ext}"
173 media_tasks.append((src, dest, {}))
174
175 # reddit video
176 if getattr(item, "is_video", False):
177 rv = getattr(item, "media", {}) or {}
178 if "reddit_video" in rv:
179 vurl = rv["reddit_video"].get("fallback_url")
180 if vurl:
181 body_md += f"\n\n[Video]({vurl})\n\n"
182 ext = Path(vurl.split("?")[0]).suffix or ".mp4"
183 dest = Path("media") / f"sub_{item.id}" / f"{item.id}_video{ext}"
184 media_tasks.append((vurl, dest, {}))
185
186 if not body_md:
187 body_md = item.selftext or ""
188
189 return fm, body_md, media_tasks
190
191def make_comment_markdown_base(comment: Comment) -> Tuple[Dict, str]:
192 fm = {
193 "id": comment.id,
194 "type": "comment",
195 "subreddit": str(comment.subreddit),
196 "author": str(comment.author) if comment.author else None,
197 "created_utc": ts_to_iso(comment.created_utc),
198 "score": comment.score,
199 "permalink": f"https://reddit.com{comment.permalink}",
200 "parent_id": comment.parent_id,
201 "link_id": comment.link_id,
202 }
203 body_md = md(getattr(comment, "body_html", None) or comment.body or "")
204 return fm, body_md
205
206# ---------- Comment tree helpers ----------
207@retry_on_rate_limit()
208def build_submission_comment_map(submission: Submission) -> Dict[str, Any]:
209 try:
210 submission.comments.replace_more(limit=None)
211 except Exception as e:
212 LOG.debug("replace_more limit=None raised: %s", e)
213 all_comments = submission.comments.list()
214 mapping: Dict[str, Any] = {}
215 for c in all_comments:
216 if isinstance(c, Comment):
217 mapping[f"t1_{c.id}"] = c
218 mapping[f"t3_{submission.id}"] = submission
219 return mapping
220
221def extract_parent_chain(comment: Comment, mapping: Dict[str, Any]) -> List[Any]:
222 chain = []
223 cur = getattr(comment, "parent_id", None)
224 visited = set()
225 while cur:
226 if cur in visited:
227 break
228 visited.add(cur)
229 obj = mapping.get(cur)
230 if obj is None:
231 break
232 chain.insert(0, obj)
233 if isinstance(obj, Submission):
234 break
235 cur = getattr(obj, "parent_id", None)
236 return chain
237
238def extract_child_subtree(comment_fullname: str, mapping: Dict[str, Any]) -> List[Comment]:
239 parent_index: Dict[str, List[Comment]] = {}
240 for fullname, obj in mapping.items():
241 if isinstance(obj, Comment):
242 parent_index.setdefault(obj.parent_id, []).append(obj)
243 out: List[Comment] = []
244 queue = parent_index.get(comment_fullname, [])[:]
245 while queue:
246 node = queue.pop(0)
247 out.append(node)
248 node_full = f"t1_{node.id}"
249 children = parent_index.get(node_full, [])
250 if children:
251 queue[0:0] = children
252 return out
253
254# ---------- Exporter ----------
255class Exporter:
256 def __init__(self, reddit: praw.Reddit, outdir: Path, download_media: bool, workers: int, state_file: Path):
257 self.reddit = reddit
258 self.outdir = outdir
259 self.download_media = download_media
260 self.workers = workers
261 self.state_file = state_file
262 self.state = {
263 "processed_submissions": [],
264 "processed_comments": [],
265 "processed_saved": []
266 }
267 self._load_state()
268 self.media_tasks: List[Tuple[str, Path, Dict]] = []
269 self.index: List[Dict] = []
270 self.submission_cache: Dict[str, Dict[str, Any]] = {}
271
272 def _load_state(self):
273 if self.state_file.exists():
274 try:
275 with self.state_file.open("r", encoding="utf-8") as fh:
276 self.state = json.load(fh)
277 except Exception as e:
278 LOG.warning("Failed to load state.json: %s. Starting fresh.", e)
279 self.state = {
280 "processed_submissions": [],
281 "processed_comments": [],
282 "processed_saved": []
283 }
284 else:
285 self._save_state()
286
287 def _save_state(self):
288 atomic_write_json(self.state_file, self.state)
289
290 def _mark_processed(self, kind: str, id_: str):
291 key = f"processed_{kind}"
292 if id_ not in self.state.get(key, []):
293 self.state.setdefault(key, []).append(id_)
294 self._save_state()
295
296 def queue_media(self, url: str, dest_rel: Path, meta: Dict):
297 self.media_tasks.append((url, dest_rel, meta))
298
299 def write_markdown(self, relpath: Path, fm: Dict, body_md: str) -> str:
300 full = self.outdir / relpath
301 ensure_dir(full.parent)
302 post = frontmatter.Post(body_md, **fm)
303 full.write_text(frontmatter.dumps(post), encoding="utf-8")
304 self.index.append(fm)
305 return str(relpath)
306
307 @retry_on_rate_limit(max_attempts=10, base_sleep=5.0)
308 def export_submission(self, submission: Submission):
309 if submission.id in self.state["processed_submissions"]:
310 return
311 self._mark_processed("submissions", submission.id)
312
313 fm, body_md, media_tasks = make_submission_markdown(submission)
314 relpath = Path("submissions") / f"{submission.id}.{safe_slug(submission.title)}.md"
315 self.write_markdown(relpath, fm, body_md)
316 for url, dest_rel, meta in media_tasks:
317 self.queue_media(url, dest_rel, meta)
318
319 @retry_on_rate_limit(max_attempts=10, base_sleep=5.0)
320 def export_comment(self, comment: Comment):
321 if comment.id in self.state["processed_comments"]:
322 return
323 self._mark_processed("comments", comment.id)
324
325 submission = self.submission_cache.get(comment.link_id)
326 if submission is None:
327 submission = comment.submission
328 self.submission_cache[comment.link_id] = submission
329
330 submission_fm, _, _ = make_submission_markdown(submission)
331
332 mapping = build_submission_comment_map(submission)
333 parent_chain = extract_parent_chain(comment, mapping)
334 child_subtree = extract_child_subtree(comment.id, mapping)
335
336 fm, body_md = make_comment_markdown_base(comment)
337
338 all_parts = []
339 for chain_item in parent_chain:
340 if isinstance(chain_item, Submission):
341 all_parts.append(f"## Submission: {submission_fm['title']}\n\n{chain_item.selftext or '[link]'}")
342 else:
343 _, c_md = make_comment_markdown_base(chain_item)
344 all_parts.append(f"## Parent Comment\n\n{c_md}")
345
346 all_parts.append(f"## This Comment\n\n{body_md}")
347
348 for child in child_subtree:
349 _, c_md = make_comment_markdown_base(child)
350 all_parts.append(f"## Reply\n\n{c_md}")
351
352 full_body_md = "\n\n---\n\n".join(all_parts)
353
354 relpath = Path("comments") / f"{comment.id}_{ts_to_iso(comment.created_utc).replace(':', '-')}_{safe_slug(str(comment.subreddit))}.md"
355 self.write_markdown(relpath, fm, full_body_md)
356
357 def export_saved_item(self, item):
358 # item can be Submission or Comment
359 if hasattr(item, 'selftext'):
360 # Submission
361 self.export_submission(item)
362 else:
363 # Comment
364 self.export_comment(item)
365
366 def download_all_media(self):
367 if not self.download_media or not self.media_tasks:
368 return
369
370 LOG.info(f"Downloading {len(self.media_tasks)} media files...")
371 session = requests.Session()
372 with ThreadPoolExecutor(max_workers=self.workers) as executor:
373 futures = []
374 for url, dest_rel, meta in self.media_tasks:
375 dest_full = self.outdir / dest_rel
376 if not dest_full.exists():
377 futures.append(executor.submit(download_file, session, url, dest_full))
378 for future in tqdm(as_completed(futures), total=len(futures), desc="media"):
379 url, dest, success = future.result()
380
381 def write_index_files(self):
382 index_json = self.outdir / "index.json"
383 atomic_write_json(index_json, self.index)
384
385 index_csv = self.outdir / "index.csv"
386 if self.index:
387 fieldnames = sorted(self.index[0].keys())
388 with index_csv.open("w", newline="", encoding="utf-8") as fh:
389 writer = csv.DictWriter(fh, fieldnames=fieldnames)
390 writer.writeheader()
391 writer.writerows(self.index)
392
393# ---------- High-level flows ----------
394@retry_on_rate_limit()
395def fetch_user_submissions(reddit: praw.Reddit, username: str, limit: Optional[int] = None):
396 return reddit.redditor(username).submissions.new(limit=limit)
397
398@retry_on_rate_limit()
399def fetch_user_comments(reddit: praw.Reddit, username: str, limit: Optional[int] = None):
400 return reddit.redditor(username).comments.new(limit=limit)
401
402@retry_on_rate_limit()
403def fetch_user_saved(reddit: praw.Reddit, username: str, limit: Optional[int] = None):
404 return reddit.redditor(username).saved(limit=limit)
405
406def main():
407 parser = argparse.ArgumentParser(description="Reddit export with rate-limit retry + resume state")
408 parser.add_argument("--username", required=True)
409 parser.add_argument("--outdir", default="./reddit_export")
410 parser.add_argument("--submissions", action="store_true")
411 parser.add_argument("--comments", action="store_true")
412 parser.add_argument("--saved", action="store_true")
413 parser.add_argument("--limit", type=int, default=None)
414 parser.add_argument("--download-media", action="store_true")
415 parser.add_argument("--workers", type=int, default=8)
416 parser.add_argument("--state-file", default="state.json")
417 args = parser.parse_args()
418
419 outdir = Path(args.outdir).expanduser()
420 ensure_dir(outdir)
421 state_file = Path(args.state_file).expanduser()
422
423 # Use environment variables or praw.ini
424 client_id = os.environ.get("REDDIT_CLIENT_ID")
425 client_secret = os.environ.get("REDDIT_CLIENT_SECRET")
426 user_agent = os.environ.get("REDDIT_USER_AGENT", "reddit_exporter")
427
428 if not client_id or not client_secret:
429 LOG.warning("Missing Reddit API credentials in environment variables; make sure praw.ini exists if exporting saved/private items.")
430 reddit = praw.Reddit(site_name="DEFAULT")
431 else:
432 reddit = praw.Reddit(
433 client_id=client_id,
434 client_secret=client_secret,
435 user_agent=user_agent
436 )
437
438 exporter = Exporter(reddit, outdir, download_media=args.download_media, workers=args.workers, state_file=state_file)
439
440 if args.submissions:
441 LOG.info("Fetching submissions for %s", args.username)
442 for s in tqdm(fetch_user_submissions(reddit, args.username, limit=args.limit), desc="submissions"):
443 try:
444 exporter.export_submission(s)
445 except Exception as e:
446 LOG.exception("Error exporting submission %s: %s", getattr(s, "id", "<unknown>"), e)
447
448 if args.comments:
449 LOG.info("Fetching comments for %s", args.username)
450 for c in tqdm(fetch_user_comments(reddit, args.username, limit=args.limit), desc="comments"):
451 try:
452 exporter.export_comment(c)
453 except Exception as e:
454 LOG.exception("Error exporting comment %s: %s", getattr(c, "id", "<unknown>"), e)
455
456 if args.saved:
457 LOG.info("Fetching saved items for %s", args.username)
458 for item in tqdm(fetch_user_saved(reddit, args.username, limit=args.limit), desc="saved"):
459 try:
460 exporter.export_saved_item(item)
461 except Exception as e:
462 LOG.exception("Error exporting saved item: %s", e)
463
464 exporter.download_all_media()
465 exporter.write_index_files()
466 LOG.info("Done. Output directory: %s", outdir)
467
468if __name__ == "__main__":
469 main()

Detailed Reddit App Creation Guide

I'll explain how to create the app step-by-step:

1. Log into Reddit

Go to reddit.com and log into your account.

2. Access the App Preferences

Navigate to the "Preferences" page by clicking on your username in the top right, then select "User Settings". On mobile, tap your profile icon and go to settings.

3. Create a New App

Scroll down to the bottom of the page and look for the "App" section. Click "Create App" or "Create Another App".

4. Fill in App Details

  • Name: Give your app a descriptive name like "Reddit Data Export" (choose something memorable)
  • App Type: Select "script"
  • Description: Optional, but you can add a brief description
  • About URL: Leave blank (optional)
  • Redirect URI: Use http://localhost:8080 (required for scripts, though not used)

5. Get Your App Credentials

After creating the app, you'll see:

  • client_id: This is the string under the app name
  • client_secret: The "secret" value shown

Important Security Note: Never share your client_secret publicly. It's like a password for your app's access to Reddit.

6. Configure Your praw.ini File

Create a new file in your project directory named praw.ini and fill in the values as shown above.

Step 2: Understanding the Python Export Script

Now that you have your Reddit API credentials set up, let's dive into the Python script that does the heavy lifting. This isn't just a simple exporter—it's a robust tool designed for production use with advanced features you won't find in basic Reddit exporters.

Key Features of This Script:

  • Rate Limit Handling: Reddit has strict API limits (600 requests per 10 minutes). The script automatically handles rate limiting with exponential backoff.
  • Resume Capability: If your export gets interrupted, it picks up exactly where it left off using a state.json file.
  • Full Conversation Trees: For comments, it exports complete threads including parent posts and all child replies.
  • Media Downloads: Downloads images, videos, and gallery content concurrently.
  • Progress Tracking: Real-time progress bars show exactly what's happening.
  • Multiple Export Formats: Choose to export submissions, comments, or saved posts individually or together.
  • Concurrent Processing: Uses threading to download multiple files simultaneously.

Script Architecture Breakdown

The script is organized into several key components:

Rate Limiting & Retry Logic

Reddit's API enforces strict rate limits. This script uses a decorator pattern to handle retries with intelligent backoff.

Media Download System

Concurrent download of images, videos, and other media with progress tracking and error handling.

Comment Thread Reconstruction

Advanced algorithms to rebuild full conversation threads from flattened API responses.

State Management

JSON-based state tracking ensures you never lose progress and can resume interrupted exports.

Step 3: Installing Dependencies & Running the Script

With your API credentials configured and the script ready, let's set up the environment and run your export.

1. Create a Virtual Environment (Recommended)

Virtual environments keep your project dependencies isolated from your system Python.

bash
1python3 -m venv venv
2source venv/bin/activate # On Windows: venv\Scripts\activate

2. Upgrade pip and Install Dependencies

Always start by upgrading pip for the latest package management features.

bash
1pip install --upgrade pip
2pip install praw markdownify python-frontmatter requests tqdm

Note: If you encounter installation issues, you may need additional system packages:

  • Ubuntu/Debian: sudo apt-get install python3-dev
  • macOS: brew install python (if using Homebrew)
  • Windows: Usually works out-of-the-box

3. Prepare the Script

Save the Python code above as reddit_export.py in your project directory alongside praw.ini.

4. Run Your Export

Choose your export options based on what you want to archive:

Export Everything (Submissions, Comments, Saved)

bash
1python3 reddit_export.py --username YOUR_USERNAME --outdir ./reddit_export --submissions --comments --saved --download-media

Export Only Submissions

bash
1python3 reddit_export.py --username YOUR_USERNAME --outdir ./reddit_export --submissions

Export Only Comments

bash
1python3 reddit_export.py --username YOUR_USERNAME --outdir ./reddit_export --comments --download-media

Export Saved Posts Only

bash
1python3 reddit_export.py --username YOUR_USERNAME --outdir ./reddit_export --saved

Understanding Script Options & Parameters

  • --username: Your Reddit username (required)
  • --outdir: Directory for exported files (default: ./reddit_export)
  • --submissions: Export your submitted posts
  • --comments: Export your comments and replies
  • --saved: Export your saved posts and comments
  • --download-media: Download images, videos, and other media
  • --limit: Limit number of items per type (optional, useful for testing)
  • --workers: Number of concurrent download threads (default: 8)
  • --state-file: Location of progress tracking file (default: state.json)

Advanced Configuration & Customization

Environment Variables (Alternative to praw.ini)

For enhanced security, you can use environment variables instead of the config file:

bash
1export REDDIT_CLIENT_ID="your_client_id"
2export REDDIT_CLIENT_SECRET="your_client_secret"
3export REDDIT_USER_AGENT="reddit-export-script by /u/your_username"

Then run without the config file:

bash
1python3 reddit_export.py --username YOUR_USERNAME --outdir ./reddit_export --submissions --comments --saved --download-media

What Gets Exported & File Organization

Directory Structure

Your export creates a clean, organized structure:

text
1reddit_export/
2├── submissions/ # All your posts
3│ ├── abc123.post-title.md
4│ └── def456.another-post.md
5├── comments/ # All your comments
6│ ├── comment_id_timestamp_subreddit.md
7│ └── ...
8├── media/ # Downloaded images/videos
9│ ├── sub_abc123/
10│ └── sub_def456/
11├── index.json # Complete metadata index
12├── index.csv # CSV format for easy filtering
13└── state.json # Progress tracking

Frontmatter Metadata

Each Markdown file includes comprehensive metadata:

yaml
1id: abc123
2type: submission
3title: "My Reddit Post Title"
4subreddit: AskReddit
5author: your_username
6created_utc: "2025-01-15T10:30:45"
7score: 42
8num_comments: 128
9permalink: https://reddit.com/r/AskReddit/comments/abc123/my_reddit_post_title/
10url: https://example.com/image.jpg
11over_18: false
12distinguished: null
13stickied: false
14edited: false

Troubleshooting Common Issues

Rate Limiting Errors

If you see 429 errors, the script handles this automatically. However, extremely large exports may take time due to API limits.

Authentication Problems

Verify your praw.ini values match exactly what's shown in Reddit's app settings. No extra spaces!

Missing Media Downloads

Some older posts may have media that's no longer available. Check your export logs for details.

Large Exports Taking Forever

Use --limit for smaller test runs first. For production exports, consider running during off-peak hours.

Permission Issues

Ensure your output directory is writable and you have sufficient disk space.

Post-Export Operations

Data Analysis

With your data in Markdown, you can use various tools:

  • grep for searching: grep -r "search term" reddit_export/
  • wc for statistics: find reddit_export/ -name "*.md" | wc -l
  • pandoc for conversion: Convert to HTML, PDF, or other formats

Migration to Other Platforms

The clean Markdown format makes migration easy:

  • Static site generators (Hugo, Jekyll, Eleventy)
  • Note-taking apps (Obsidian, Notion)
  • Personal wikis (MediaWiki, BookStack)

Search & Indexing

Create full-text search indexes:

bash
1# Install ripgrep for fast searching
2brew install ripgrep # macOS
3sudo apt install ripgrep # Ubuntu
4
5# Search across all files instantly
6rg "artificial intelligence" reddit_export/

Privacy & Security Considerations

  • Store Credentials Securely: Never commit praw.ini to version control
  • Data Privacy: Exported data may contain personal information
  • Storage: Consider encrypting your export directory for added security
  • Cleanup: Delete export data when no longer needed

FAQ (Frequently Asked Questions)

Q: How long does the export take?

A: Depends on your activity level. Small accounts: minutes. Large accounts with years of history: hours to days. The script shows progress and can resume.

Q: What's the difference between --saved and regular exports?

A: --saved exports posts/comments you bookmarked. --submissions/--comments export content you created.

Q: Can I export other users' data?

A: Only your own. Reddit API respects privacy settings.

Q: What if I delete a Reddit account?

A: Exports preserve the data even after deletion.

Q: Does this violate Reddit's Terms of Service?

A: No, this uses official APIs within their guidelines. It's for personal backups.

Q: Can I export private messages?

A: This script focuses on posts/comments. PMs require different API calls.

Q: Why Markdown and not JSON/CSV?

A: Markdown is human-readable, searchable, and works with existing static site tools.

Q: How much storage space is needed?

A: Varies wildly. Text-only: minimal. With media from an active account: hundreds of MB to GB.

Q: Can I modify the script for custom formats?

A: Absolutely! The code is well-documented and modular for customization.

Conclusion: Take Control of Your Reddit Data

In an age where platforms control our digital lives, taking ownership of your data is empowering. This comprehensive Reddit exporter gives you complete control—backup, analyze, migrate, or archive your Reddit history as you see fit.

Whether you're leaving Reddit, starting a personal blog migration, or just want searchable archives of your contributions, this tool provides enterprise-grade reliability with simple execution.

Remember: your digital footprint belongs to you. Regular exports ensure your conversations, ideas, and contributions remain accessible regardless of platform changes.

Start your export today and regain control of your online history!

Have questions or need help troubleshooting? Check the troubleshooting section above or search for solutions in the comments—all exported data remains searchable and accessible.

Sovereign AI book cover

Sovereign AI: Building Local-First Intelligent Systems

by Daniel Kliewer · Paperback · 72 pages

The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.