mirror of
				https://github.com/KevinMidboe/bulk-downloader-for-reddit.git
				synced 2025-10-29 17:40:15 +00:00 
			
		
		
		
	Merge pull request #30 from aliparlakci/SelfDownloader
- Added self post download feature - Made the searching process quicker by writing posts to file at the end of the search - Added long file bug solution to remaining download classes - Updated the README file to make it minimal
This commit is contained in:
		
							
								
								
									
										80
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										80
									
								
								README.md
									
									
									
									
									
								
							| @@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p | ||||
|  | ||||
| ## Table of Contents | ||||
|  | ||||
| - [What it can do?](#what-it-can-do) | ||||
| - [Requirements](#requirements) | ||||
| - [Setting up the script](#setting-up-the-script) | ||||
|   - [Creating an imgur app](#creating-an-imgur-app) | ||||
| - [Program Modes](#program-modes) | ||||
|   - [saved mode](#saved-mode) | ||||
|   - [submitted mode](#submitted-mode) | ||||
|   - [upvoted mode](#upvoted-mode) | ||||
|   - [subreddit mode](#subreddit-mode) | ||||
|   - [multireddit mode](#multireddit-mode) | ||||
|   - [link mode](#link-mode) | ||||
|   - [log read mode](#log-read-mode) | ||||
| - [Running the script](#running-the-script) | ||||
|   - [Using the command line arguments](#using-the-command-line-arguments) | ||||
|   - [Examples](#examples) | ||||
| - [FAQ](#faq) | ||||
| - [Changelog](#changelog) | ||||
|   - [release-1.0.0](#release-100) | ||||
|  | ||||
| ## What it can do? | ||||
| ### It... | ||||
| - can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links | ||||
| - sorts post by hot, top, new and so on | ||||
| - downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image | ||||
| - skips the existing ones | ||||
| - puts post titles to file's name | ||||
| - puts every post to its subreddit's folder | ||||
| - saves reusable a copy of posts' details that are found so that they can be re-downloaded again | ||||
| - logs failed ones in a file to so that you can try to download them later | ||||
| - can be run with double-clicking on Windows (but I don't recommend it) | ||||
|  | ||||
| ## Requirements | ||||
| - Python 3.x* | ||||
| @@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl | ||||
|  | ||||
| ## Program Modes | ||||
| All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)   | ||||
| ### saved mode | ||||
| In saved mode, the program gets posts from given user's saved posts. | ||||
| ### submitted mode | ||||
| In submitted mode, the program gets posts from given user's submitted posts. | ||||
| ### upvoted mode | ||||
| In submitted mode, the program gets posts from given user's upvoted posts. | ||||
| ### subreddit mode | ||||
| In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.   | ||||
|    | ||||
| Multiple subreddits can be given | ||||
|    | ||||
| *You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).* | ||||
| ### multireddit mode | ||||
| In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.   | ||||
| ### link mode | ||||
| In link mode, the program gets posts from given reddit link.   | ||||
|    | ||||
| You may customize the behaviour with `--sort`, `--time`, `--limit`. | ||||
|    | ||||
| *You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).* | ||||
|    | ||||
| ## log read mode | ||||
| Two log files are created each time *script.py* runs. | ||||
| - **POSTS** Saves all the posts without filtering. | ||||
| - **FAILED** Keeps track of posts that are tried to be downloaded but failed. | ||||
|    | ||||
| In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again. | ||||
|  | ||||
| Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur. | ||||
| - **saved mode** | ||||
|   - Gets posts from given user's saved posts. | ||||
| - **submitted mode** | ||||
|   - Gets posts from given user's submitted posts. | ||||
| - **upvoted mode** | ||||
|   - Gets posts from given user's upvoted posts. | ||||
| - **subreddit mode** | ||||
|   - Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number. | ||||
|   - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments). | ||||
| - **multireddit mode** | ||||
|   - Gets posts from given user's given multireddit that is sorted by given type and limited by given number.   | ||||
| - **link mode** | ||||
|   - Gets posts from given reddit link.   | ||||
|   - You may customize the behaviour with `--sort`, `--time`, `--limit`. | ||||
|   - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments). | ||||
| - **log read mode** | ||||
|   - Takes a log file which created by itself (json files), reads posts and tries downloading them again. | ||||
|   - Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur. | ||||
|  | ||||
| ## Running the script | ||||
| **WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate.   | ||||
| **DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.   | ||||
|    | ||||
| ### Using the command line arguments | ||||
| If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure). | ||||
| @@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m | ||||
|    | ||||
| Run the script.py file from terminal with command-line arguments. Here is the help page:   | ||||
|    | ||||
| **ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird. | ||||
| Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird. | ||||
|  | ||||
| ```console | ||||
| $ py -3 script.py --help | ||||
| @@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json | ||||
| ### I can't startup the script no matter what. | ||||
| - Try `python3` or `python` or `py -3` as python have real issues about naming their program | ||||
|  | ||||
| ### I can't open the self posts. | ||||
| - Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome. | ||||
|  | ||||
| ## Changelog | ||||
| ### v1.0.0 | ||||
| - Initial release | ||||
| ### 10/07/2018 | ||||
| - Added support for *self* post | ||||
| - Now getting posts is quicker | ||||
|   | ||||
							
								
								
									
										17
									
								
								script.py
									
									
									
									
									
								
							
							
						
						
									
										17
									
								
								script.py
									
									
									
									
									
								
							| @@ -11,7 +11,7 @@ import sys | ||||
| import time | ||||
| from pathlib import Path, PurePath | ||||
|  | ||||
| from src.downloader import Direct, Gfycat, Imgur | ||||
| from src.downloader import Direct, Gfycat, Imgur, Self | ||||
| from src.parser import LinkDesigner | ||||
| from src.searcher import getPosts | ||||
| from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector, | ||||
| @@ -452,6 +452,21 @@ def download(submissions): | ||||
|                 FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]}) | ||||
|                 downloadedCount -= 1 | ||||
|          | ||||
|         elif submissions[i]['postType'] == 'self': | ||||
|             print("SELF") | ||||
|             try: | ||||
|                 Self(directory,submissions[i]) | ||||
|  | ||||
|             except FileAlreadyExistsError: | ||||
|                 print("It already exists") | ||||
|                 downloadedCount -= 1 | ||||
|                 duplicates += 1 | ||||
|  | ||||
|             except Exception as exception: | ||||
|                 print(exception) | ||||
|                 FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]}) | ||||
|                 downloadedCount -= 1 | ||||
|  | ||||
|         else: | ||||
|             print("No match found, skipping...") | ||||
|             downloadedCount -= 1 | ||||
|   | ||||
| @@ -1,3 +1,4 @@ | ||||
| import io | ||||
| import os | ||||
| import sys | ||||
| import urllib.request | ||||
| @@ -16,7 +17,7 @@ except ModuleNotFoundError: | ||||
|     install("imgurpython") | ||||
|     from imgurpython import * | ||||
|  | ||||
|  | ||||
| VanillaPrint = print | ||||
| print = printToFile | ||||
|  | ||||
| def dlProgress(count, blockSize, totalSize): | ||||
| @@ -294,3 +295,45 @@ class Direct: | ||||
|             tempDir = directory / (POST['postId']+".tmp") | ||||
|  | ||||
|             getFile(fileDir,tempDir,POST['postURL']) | ||||
|  | ||||
| class Self: | ||||
|     def __init__(self,directory,post): | ||||
|         if not os.path.exists(directory): os.makedirs(directory) | ||||
|  | ||||
|         title = nameCorrector(post['postTitle']) | ||||
|         print(title+"_"+post['postId']+".md") | ||||
|  | ||||
|         fileDir = title+"_"+post['postId']+".md" | ||||
|         fileDir = directory / fileDir | ||||
|          | ||||
|         if Path.is_file(fileDir): | ||||
|             raise FileAlreadyExistsError | ||||
|              | ||||
|         try: | ||||
|             self.writeToFile(fileDir,post) | ||||
|         except FileNotFoundError: | ||||
|             fileDir = post['postId']+".md" | ||||
|             fileDir = directory / fileDir | ||||
|  | ||||
|             self.writeToFile(fileDir,post) | ||||
|      | ||||
|     @staticmethod | ||||
|     def writeToFile(directory,post): | ||||
|  | ||||
|         content = ("## [" | ||||
|                    + post["postTitle"] | ||||
|                    + "](" | ||||
|                    + post["postURL"] | ||||
|                    + ")\n" | ||||
|                    + post["postContent"] | ||||
|                    + "\n\n---\n\n" | ||||
|                    + "submitted by [u/" | ||||
|                    + post["postSubmitter"] | ||||
|                    + "](https://www.reddit.com/user/" | ||||
|                    + post["postSubmitter"] | ||||
|                    + ")") | ||||
|  | ||||
|         with io.open(directory,"w",encoding="utf-8") as FILE: | ||||
|             VanillaPrint(content,file=FILE) | ||||
|          | ||||
|         print("Downloaded") | ||||
|   | ||||
| @@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False): | ||||
|     imgurCount = 0 | ||||
|     global directCount | ||||
|     directCount = 0 | ||||
|     global selfCount | ||||
|     selfCount = 0 | ||||
|  | ||||
|     allPosts = {} | ||||
|  | ||||
|     postsFile = createLogFile("POSTS") | ||||
|  | ||||
| @@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False): | ||||
|                 printSubmission(submission,subCount,orderCount) | ||||
|                 subList.append(details) | ||||
|  | ||||
|             postsFile.add({subCount:[details]}) | ||||
|             allPosts = {**allPosts,**details} | ||||
|          | ||||
|         postsFile.add(allPosts) | ||||
|  | ||||
|     if not len(subList) == 0:     | ||||
|         print( | ||||
|             "\nTotal of {} submissions found!\n"\ | ||||
|             "{} GFYCATs, {} IMGURs and {} DIRECTs\n" | ||||
|             .format(len(subList),gfycatCount,imgurCount,directCount) | ||||
|             "{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n" | ||||
|             .format(len(subList),gfycatCount,imgurCount,directCount,selfCount) | ||||
|         ) | ||||
|         return subList | ||||
|     else: | ||||
| @@ -372,6 +378,7 @@ def checkIfMatching(submission): | ||||
|     global gfycatCount | ||||
|     global imgurCount | ||||
|     global directCount | ||||
|     global selfCount | ||||
|  | ||||
|     try: | ||||
|         details = {'postId':submission.id, | ||||
| @@ -397,13 +404,15 @@ def checkIfMatching(submission): | ||||
|             imgurCount += 1 | ||||
|             return details | ||||
|  | ||||
|     elif isDirectLink(submission.url) is True: | ||||
|     elif isDirectLink(submission.url): | ||||
|         details['postType'] = 'direct' | ||||
|         directCount += 1 | ||||
|         return details | ||||
|  | ||||
|     elif submission.is_self: | ||||
|         details['postType'] = 'self' | ||||
|         details['postContent'] = submission.selftext | ||||
|         selfCount += 1 | ||||
|         return details | ||||
|  | ||||
| def printSubmission(SUB,validNumber,totalNumber): | ||||
|   | ||||
		Reference in New Issue
	
	Block a user