mirror of
				https://github.com/KevinMidboe/bulk-downloader-for-reddit.git
				synced 2025-10-29 17:40:15 +00:00 
			
		
		
		
	Merge pull request #30 from aliparlakci/SelfDownloader
- Added self post download feature - Made the searching process quicker by writing posts to file at the end of the search - Added long file bug solution to remaining download classes - Updated the README file to make it minimal
This commit is contained in:
		
							
								
								
									
										80
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										80
									
								
								README.md
									
									
									
									
									
								
							| @@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p | |||||||
|  |  | ||||||
| ## Table of Contents | ## Table of Contents | ||||||
|  |  | ||||||
|  | - [What it can do?](#what-it-can-do) | ||||||
| - [Requirements](#requirements) | - [Requirements](#requirements) | ||||||
| - [Setting up the script](#setting-up-the-script) | - [Setting up the script](#setting-up-the-script) | ||||||
|   - [Creating an imgur app](#creating-an-imgur-app) |   - [Creating an imgur app](#creating-an-imgur-app) | ||||||
| - [Program Modes](#program-modes) | - [Program Modes](#program-modes) | ||||||
|   - [saved mode](#saved-mode) |  | ||||||
|   - [submitted mode](#submitted-mode) |  | ||||||
|   - [upvoted mode](#upvoted-mode) |  | ||||||
|   - [subreddit mode](#subreddit-mode) |  | ||||||
|   - [multireddit mode](#multireddit-mode) |  | ||||||
|   - [link mode](#link-mode) |  | ||||||
|   - [log read mode](#log-read-mode) |  | ||||||
| - [Running the script](#running-the-script) | - [Running the script](#running-the-script) | ||||||
|   - [Using the command line arguments](#using-the-command-line-arguments) |   - [Using the command line arguments](#using-the-command-line-arguments) | ||||||
|   - [Examples](#examples) |   - [Examples](#examples) | ||||||
| - [FAQ](#faq) | - [FAQ](#faq) | ||||||
| - [Changelog](#changelog) | - [Changelog](#changelog) | ||||||
|   - [release-1.0.0](#release-100) |  | ||||||
|  | ## What it can do? | ||||||
|  | ### It... | ||||||
|  | - can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links | ||||||
|  | - sorts post by hot, top, new and so on | ||||||
|  | - downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image | ||||||
|  | - skips the existing ones | ||||||
|  | - puts post titles to file's name | ||||||
|  | - puts every post to its subreddit's folder | ||||||
|  | - saves reusable a copy of posts' details that are found so that they can be re-downloaded again | ||||||
|  | - logs failed ones in a file to so that you can try to download them later | ||||||
|  | - can be run with double-clicking on Windows (but I don't recommend it) | ||||||
|  |  | ||||||
| ## Requirements | ## Requirements | ||||||
| - Python 3.x* | - Python 3.x* | ||||||
| @@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl | |||||||
|  |  | ||||||
| ## Program Modes | ## Program Modes | ||||||
| All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)   | All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)   | ||||||
| ### saved mode | - **saved mode** | ||||||
| In saved mode, the program gets posts from given user's saved posts. |   - Gets posts from given user's saved posts. | ||||||
| ### submitted mode | - **submitted mode** | ||||||
| In submitted mode, the program gets posts from given user's submitted posts. |   - Gets posts from given user's submitted posts. | ||||||
| ### upvoted mode | - **upvoted mode** | ||||||
| In submitted mode, the program gets posts from given user's upvoted posts. |   - Gets posts from given user's upvoted posts. | ||||||
| ### subreddit mode | - **subreddit mode** | ||||||
| In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.   |   - Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number. | ||||||
|    |   - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments). | ||||||
| Multiple subreddits can be given | - **multireddit mode** | ||||||
|    |   - Gets posts from given user's given multireddit that is sorted by given type and limited by given number.   | ||||||
| *You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).* | - **link mode** | ||||||
| ### multireddit mode |   - Gets posts from given reddit link.   | ||||||
| In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.   |   - You may customize the behaviour with `--sort`, `--time`, `--limit`. | ||||||
| ### link mode |   - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments). | ||||||
| In link mode, the program gets posts from given reddit link.   | - **log read mode** | ||||||
|    |   - Takes a log file which created by itself (json files), reads posts and tries downloading them again. | ||||||
| You may customize the behaviour with `--sort`, `--time`, `--limit`. |   - Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur. | ||||||
|    |  | ||||||
| *You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).* |  | ||||||
|    |  | ||||||
| ## log read mode |  | ||||||
| Two log files are created each time *script.py* runs. |  | ||||||
| - **POSTS** Saves all the posts without filtering. |  | ||||||
| - **FAILED** Keeps track of posts that are tried to be downloaded but failed. |  | ||||||
|    |  | ||||||
| In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again. |  | ||||||
|  |  | ||||||
| Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur. |  | ||||||
|  |  | ||||||
| ## Running the script | ## Running the script | ||||||
| **WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate.   | **DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.   | ||||||
|    |    | ||||||
| ### Using the command line arguments | ### Using the command line arguments | ||||||
| If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure). | If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure). | ||||||
| @@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m | |||||||
|    |    | ||||||
| Run the script.py file from terminal with command-line arguments. Here is the help page:   | Run the script.py file from terminal with command-line arguments. Here is the help page:   | ||||||
|    |    | ||||||
| **ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird. | Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird. | ||||||
|  |  | ||||||
| ```console | ```console | ||||||
| $ py -3 script.py --help | $ py -3 script.py --help | ||||||
| @@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json | |||||||
| ### I can't startup the script no matter what. | ### I can't startup the script no matter what. | ||||||
| - Try `python3` or `python` or `py -3` as python have real issues about naming their program | - Try `python3` or `python` or `py -3` as python have real issues about naming their program | ||||||
|  |  | ||||||
|  | ### I can't open the self posts. | ||||||
|  | - Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome. | ||||||
|  |  | ||||||
| ## Changelog | ## Changelog | ||||||
| ### v1.0.0 | ### 10/07/2018 | ||||||
| - Initial release | - Added support for *self* post | ||||||
|  | - Now getting posts is quicker | ||||||
|   | |||||||
							
								
								
									
										19
									
								
								script.py
									
									
									
									
									
								
							
							
						
						
									
										19
									
								
								script.py
									
									
									
									
									
								
							| @@ -11,7 +11,7 @@ import sys | |||||||
| import time | import time | ||||||
| from pathlib import Path, PurePath | from pathlib import Path, PurePath | ||||||
|  |  | ||||||
| from src.downloader import Direct, Gfycat, Imgur | from src.downloader import Direct, Gfycat, Imgur, Self | ||||||
| from src.parser import LinkDesigner | from src.parser import LinkDesigner | ||||||
| from src.searcher import getPosts | from src.searcher import getPosts | ||||||
| from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector, | from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector, | ||||||
| @@ -451,7 +451,22 @@ def download(submissions): | |||||||
|                 print(exception) |                 print(exception) | ||||||
|                 FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]}) |                 FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]}) | ||||||
|                 downloadedCount -= 1 |                 downloadedCount -= 1 | ||||||
|                  |          | ||||||
|  |         elif submissions[i]['postType'] == 'self': | ||||||
|  |             print("SELF") | ||||||
|  |             try: | ||||||
|  |                 Self(directory,submissions[i]) | ||||||
|  |  | ||||||
|  |             except FileAlreadyExistsError: | ||||||
|  |                 print("It already exists") | ||||||
|  |                 downloadedCount -= 1 | ||||||
|  |                 duplicates += 1 | ||||||
|  |  | ||||||
|  |             except Exception as exception: | ||||||
|  |                 print(exception) | ||||||
|  |                 FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]}) | ||||||
|  |                 downloadedCount -= 1 | ||||||
|  |  | ||||||
|         else: |         else: | ||||||
|             print("No match found, skipping...") |             print("No match found, skipping...") | ||||||
|             downloadedCount -= 1 |             downloadedCount -= 1 | ||||||
|   | |||||||
| @@ -1,3 +1,4 @@ | |||||||
|  | import io | ||||||
| import os | import os | ||||||
| import sys | import sys | ||||||
| import urllib.request | import urllib.request | ||||||
| @@ -16,7 +17,7 @@ except ModuleNotFoundError: | |||||||
|     install("imgurpython") |     install("imgurpython") | ||||||
|     from imgurpython import * |     from imgurpython import * | ||||||
|  |  | ||||||
|  | VanillaPrint = print | ||||||
| print = printToFile | print = printToFile | ||||||
|  |  | ||||||
| def dlProgress(count, blockSize, totalSize): | def dlProgress(count, blockSize, totalSize): | ||||||
| @@ -294,3 +295,45 @@ class Direct: | |||||||
|             tempDir = directory / (POST['postId']+".tmp") |             tempDir = directory / (POST['postId']+".tmp") | ||||||
|  |  | ||||||
|             getFile(fileDir,tempDir,POST['postURL']) |             getFile(fileDir,tempDir,POST['postURL']) | ||||||
|  |  | ||||||
|  | class Self: | ||||||
|  |     def __init__(self,directory,post): | ||||||
|  |         if not os.path.exists(directory): os.makedirs(directory) | ||||||
|  |  | ||||||
|  |         title = nameCorrector(post['postTitle']) | ||||||
|  |         print(title+"_"+post['postId']+".md") | ||||||
|  |  | ||||||
|  |         fileDir = title+"_"+post['postId']+".md" | ||||||
|  |         fileDir = directory / fileDir | ||||||
|  |          | ||||||
|  |         if Path.is_file(fileDir): | ||||||
|  |             raise FileAlreadyExistsError | ||||||
|  |              | ||||||
|  |         try: | ||||||
|  |             self.writeToFile(fileDir,post) | ||||||
|  |         except FileNotFoundError: | ||||||
|  |             fileDir = post['postId']+".md" | ||||||
|  |             fileDir = directory / fileDir | ||||||
|  |  | ||||||
|  |             self.writeToFile(fileDir,post) | ||||||
|  |      | ||||||
|  |     @staticmethod | ||||||
|  |     def writeToFile(directory,post): | ||||||
|  |  | ||||||
|  |         content = ("## [" | ||||||
|  |                    + post["postTitle"] | ||||||
|  |                    + "](" | ||||||
|  |                    + post["postURL"] | ||||||
|  |                    + ")\n" | ||||||
|  |                    + post["postContent"] | ||||||
|  |                    + "\n\n---\n\n" | ||||||
|  |                    + "submitted by [u/" | ||||||
|  |                    + post["postSubmitter"] | ||||||
|  |                    + "](https://www.reddit.com/user/" | ||||||
|  |                    + post["postSubmitter"] | ||||||
|  |                    + ")") | ||||||
|  |  | ||||||
|  |         with io.open(directory,"w",encoding="utf-8") as FILE: | ||||||
|  |             VanillaPrint(content,file=FILE) | ||||||
|  |          | ||||||
|  |         print("Downloaded") | ||||||
|   | |||||||
| @@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False): | |||||||
|     imgurCount = 0 |     imgurCount = 0 | ||||||
|     global directCount |     global directCount | ||||||
|     directCount = 0 |     directCount = 0 | ||||||
|  |     global selfCount | ||||||
|  |     selfCount = 0 | ||||||
|  |  | ||||||
|  |     allPosts = {} | ||||||
|  |  | ||||||
|     postsFile = createLogFile("POSTS") |     postsFile = createLogFile("POSTS") | ||||||
|  |  | ||||||
| @@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False): | |||||||
|                 printSubmission(submission,subCount,orderCount) |                 printSubmission(submission,subCount,orderCount) | ||||||
|                 subList.append(details) |                 subList.append(details) | ||||||
|  |  | ||||||
|             postsFile.add({subCount:[details]}) |             allPosts = {**allPosts,**details} | ||||||
|  |          | ||||||
|  |         postsFile.add(allPosts) | ||||||
|  |  | ||||||
|     if not len(subList) == 0:     |     if not len(subList) == 0:     | ||||||
|         print( |         print( | ||||||
|             "\nTotal of {} submissions found!\n"\ |             "\nTotal of {} submissions found!\n"\ | ||||||
|             "{} GFYCATs, {} IMGURs and {} DIRECTs\n" |             "{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n" | ||||||
|             .format(len(subList),gfycatCount,imgurCount,directCount) |             .format(len(subList),gfycatCount,imgurCount,directCount,selfCount) | ||||||
|         ) |         ) | ||||||
|         return subList |         return subList | ||||||
|     else: |     else: | ||||||
| @@ -372,6 +378,7 @@ def checkIfMatching(submission): | |||||||
|     global gfycatCount |     global gfycatCount | ||||||
|     global imgurCount |     global imgurCount | ||||||
|     global directCount |     global directCount | ||||||
|  |     global selfCount | ||||||
|  |  | ||||||
|     try: |     try: | ||||||
|         details = {'postId':submission.id, |         details = {'postId':submission.id, | ||||||
| @@ -397,13 +404,15 @@ def checkIfMatching(submission): | |||||||
|             imgurCount += 1 |             imgurCount += 1 | ||||||
|             return details |             return details | ||||||
|  |  | ||||||
|     elif isDirectLink(submission.url) is True: |     elif isDirectLink(submission.url): | ||||||
|         details['postType'] = 'direct' |         details['postType'] = 'direct' | ||||||
|         directCount += 1 |         directCount += 1 | ||||||
|         return details |         return details | ||||||
|  |  | ||||||
|     elif submission.is_self: |     elif submission.is_self: | ||||||
|         details['postType'] = 'self' |         details['postType'] = 'self' | ||||||
|  |         details['postContent'] = submission.selftext | ||||||
|  |         selfCount += 1 | ||||||
|         return details |         return details | ||||||
|  |  | ||||||
| def printSubmission(SUB,validNumber,totalNumber): | def printSubmission(SUB,validNumber,totalNumber): | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user