Merge pull request #30 from aliparlakci/SelfDownloader

- Added self post download feature - Made the searching process quicker by writing posts to file at the end of the search - Added long file bug solution to remaining download classes - Updated the README file to make it minimal
2025-12-08 20:28:51 +00:00 · 2018-07-10 02:46:37 +03:00
parent b6487e4bb1 4155ec1255
commit 975246c7f0
4 changed files with 113 additions and 48 deletions
--- a/README.md
+++ b/README.md
@@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p
 ## Table of Contents
 - [What it can do?](#what-it-can-do)
 - [Requirements](#requirements)
 - [Setting up the script](#setting-up-the-script)
  - [Creating an imgur app](#creating-an-imgur-app)
 - [Program Modes](#program-modes)
  - [saved mode](#saved-mode)
  - [submitted mode](#submitted-mode)
  - [upvoted mode](#upvoted-mode)
  - [subreddit mode](#subreddit-mode)
  - [multireddit mode](#multireddit-mode)
  - [link mode](#link-mode)
  - [log read mode](#log-read-mode)
 - [Running the script](#running-the-script)
  - [Using the command line arguments](#using-the-command-line-arguments)
  - [Examples](#examples)
 - [FAQ](#faq)
 - [Changelog](#changelog)
-  - [release-1.0.0](#release-100)
+
 ## What it can do?
 ### It...
 - can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
 - sorts post by hot, top, new and so on
 - downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image
 - skips the existing ones
 - puts post titles to file's name
 - puts every post to its subreddit's folder
 - saves reusable a copy of posts' details that are found so that they can be re-downloaded again
 - logs failed ones in a file to so that you can try to download them later
 - can be run with double-clicking on Windows (but I don't recommend it)
 ## Requirements
 - Python 3.x*
@@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
 ## Program Modes
 All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)  
-### saved mode
+- **saved mode**
-In saved mode, the program gets posts from given user's saved posts.
+  - Gets posts from given user's saved posts.
-### submitted mode
+- **submitted mode**
-In submitted mode, the program gets posts from given user's submitted posts.
+  - Gets posts from given user's submitted posts.
-### upvoted mode
+- **upvoted mode**
-In submitted mode, the program gets posts from given user's upvoted posts.
+  - Gets posts from given user's upvoted posts.
-### subreddit mode
+- **subreddit mode**
-In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.  
+  - Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number.
-  
+  - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
-Multiple subreddits can be given
+- **multireddit mode**
-  
+  - Gets posts from given user's given multireddit that is sorted by given type and limited by given number.  
-*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
+- **link mode**
-### multireddit mode
+  - Gets posts from given reddit link.  
-In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.  
+  - You may customize the behaviour with `--sort`, `--time`, `--limit`.
-### link mode
+  - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
-In link mode, the program gets posts from given reddit link.  
+- **log read mode**
-  
+  - Takes a log file which created by itself (json files), reads posts and tries downloading them again.
-You may customize the behaviour with `--sort`, `--time`, `--limit`.
+  - Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
 *You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
 ## log read mode
 Two log files are created each time *script.py* runs.
 - **POSTS** Saves all the posts without filtering.
 - **FAILED** Keeps track of posts that are tried to be downloaded but failed.
 In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again.
 Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
 ## Running the script
-**WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate.  
+**DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.  
 ### Using the command line arguments
 If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure).
@@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m
 Run the script.py file from terminal with command-line arguments. Here is the help page:  
-**ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
+Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
 ```console
 $ py -3 script.py --help
@@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json
 ### I can't startup the script no matter what.
 - Try `python3` or `python` or `py -3` as python have real issues about naming their program
 ### I can't open the self posts.
 - Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome.
 ## Changelog
-### v1.0.0
+### 10/07/2018
- Initial release
+- Added support for *self* post
 - Now getting posts is quicker
--- a/script.py
+++ b/script.py
@@ -11,7 +11,7 @@ import sys
 import time
 from pathlib import Path, PurePath
-from src.downloader import Direct, Gfycat, Imgur
+from src.downloader import Direct, Gfycat, Imgur, Self
 from src.parser import LinkDesigner
 from src.searcher import getPosts
 from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
@@ -451,7 +451,22 @@ def download(submissions):
                print(exception)
                FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
                downloadedCount -= 1
-                
+        
        elif submissions[i]['postType'] == 'self':
            print("SELF")
            try:
                Self(directory,submissions[i])
            except FileAlreadyExistsError:
                print("It already exists")
                downloadedCount -= 1
                duplicates += 1
            except Exception as exception:
                print(exception)
                FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
                downloadedCount -= 1
        else:
            print("No match found, skipping...")
            downloadedCount -= 1
--- a/src/downloader.py
+++ b/src/downloader.py
@@ -1,3 +1,4 @@
 import io
 import os
 import sys
 import urllib.request
@@ -16,7 +17,7 @@ except ModuleNotFoundError:
    install("imgurpython")
    from imgurpython import *
-
+VanillaPrint = print
 print = printToFile
 def dlProgress(count, blockSize, totalSize):
@@ -294,3 +295,45 @@ class Direct:
            tempDir = directory / (POST['postId']+".tmp")
            getFile(fileDir,tempDir,POST['postURL'])
 class Self:
    def __init__(self,directory,post):
        if not os.path.exists(directory): os.makedirs(directory)
        title = nameCorrector(post['postTitle'])
        print(title+"_"+post['postId']+".md")
        fileDir = title+"_"+post['postId']+".md"
        fileDir = directory / fileDir
        if Path.is_file(fileDir):
            raise FileAlreadyExistsError
        try:
            self.writeToFile(fileDir,post)
        except FileNotFoundError:
            fileDir = post['postId']+".md"
            fileDir = directory / fileDir
            self.writeToFile(fileDir,post)
    @staticmethod
    def writeToFile(directory,post):
        content = ("## ["
                   + post["postTitle"]
                   + "]("
                   + post["postURL"]
                   + ")\n"
                   + post["postContent"]
                   + "\n\n---\n\n"
                   + "submitted by [u/"
                   + post["postSubmitter"]
                   + "](https://www.reddit.com/user/"
                   + post["postSubmitter"]
                   + ")")
        with io.open(directory,"w",encoding="utf-8") as FILE:
            VanillaPrint(content,file=FILE)
        print("Downloaded")
--- a/src/searcher.py
+++ b/src/searcher.py
@@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False):
    imgurCount = 0
    global directCount
    directCount = 0
    global selfCount
    selfCount = 0
    allPosts = {}
    postsFile = createLogFile("POSTS")
@@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False):
                printSubmission(submission,subCount,orderCount)
                subList.append(details)
-            postsFile.add({subCount:[details]})
+            allPosts = {**allPosts,**details}
        postsFile.add(allPosts)
    if not len(subList) == 0:    
        print(
            "\nTotal of {} submissions found!\n"\
-            "{} GFYCATs, {} IMGURs and {} DIRECTs\n"
+            "{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n"
-            .format(len(subList),gfycatCount,imgurCount,directCount)
+            .format(len(subList),gfycatCount,imgurCount,directCount,selfCount)
        )
        return subList
    else:
@@ -372,6 +378,7 @@ def checkIfMatching(submission):
    global gfycatCount
    global imgurCount
    global directCount
    global selfCount
    try:
        details = {'postId':submission.id,
@@ -397,13 +404,15 @@ def checkIfMatching(submission):
            imgurCount += 1
            return details
-    elif isDirectLink(submission.url) is True:
+    elif isDirectLink(submission.url):
        details['postType'] = 'direct'
        directCount += 1
        return details
    elif submission.is_self:
        details['postType'] = 'self'
        details['postContent'] = submission.selftext
        selfCount += 1
        return details
 def printSubmission(SUB,validNumber,totalNumber):