63 Commits

Author SHA1 Message Date
Ali Parlakçı
d6194c57d9 Merge pull request #67 from dbanon87/dbanon87/erome-downloads
fix erome download URLs
2019-10-17 09:51:02 +03:00
Ali Parlakçı
cd87a4a120 Merge pull request #68 from dbanon87/dbanon87/gitignore-env
add env/ to gitignore
2019-10-17 09:49:36 +03:00
dbanon87
08cddf4c83 add env/ to gitignore
This allows working in a virtualenv in the project directory.
2019-10-08 10:52:56 -04:00
dbanon87
88fa9e742d fix erome download URLs 2019-10-08 10:51:56 -04:00
Ali Parlakçı
1c17f174a8 typo 2019-04-23 17:00:42 +03:00
Ali Parlakçı
9b36336ac3 typo 2019-04-23 16:51:05 +03:00
Ali Parlakçı
35e551f20c Update README.md 2019-04-23 14:04:15 +03:00
Ali Parlakçı
0f2bda9c34 Merge pull request #63 from aliparlakci/moreUsefulReadme
A more useful readme (credits to *stared*)
2019-04-23 14:00:53 +03:00
Ali Parlakçı
8ab694bcc1 Fixed typo 2019-04-23 13:59:01 +03:00
Ali
898f59d035 Added an FAQ entry 2019-04-23 13:51:21 +03:00
Ali
6b6db37185 Minor corrections 2019-04-23 13:29:58 +03:00
Piotr Migdał
d4a5100128 a clearer description how to run it (#62) 2019-04-23 13:17:15 +03:00
Ali
22047338e2 Update version number 2019-04-09 20:45:22 +03:00
Ali
b16cdd3cbb Hopefully, fixed the config.json bug 2019-04-09 20:31:42 +03:00
Ali
2a8394a48c Fixed the bug concerning config.json 2019-04-08 22:09:52 +03:00
Ali Parlakçı
eac4404bbf Update README.md 2019-03-31 11:59:49 +03:00
Ali Parlakci
fae49d50da Update version 2019-03-31 11:46:03 +03:00
Ali Parlakci
7130525ece Update version 2019-03-31 11:35:27 +03:00
Ali Parlakci
2bf1e03ee1 Update version 2019-03-31 11:33:29 +03:00
Ali
15a91e5784 Fixed saving auth info problem 2019-02-24 12:28:40 +03:00
Ali
344201a70d Fixed v.redd.it links 2019-02-23 00:01:39 +03:00
Ali
92e47adb43 Update version 2019-02-22 23:59:57 +03:00
Ali
4d385fda60 Fixed v.redd.it links 2019-02-22 23:59:03 +03:00
Ali Parlakci
82dcd2f63d Bug fix 2019-01-27 17:05:31 +03:00
Ali Parlakci
08de21a364 Updated Python3 version 2019-01-27 16:32:43 +03:00
Ali Parlakci
af7d3d9151 Moved FAQ 2019-01-27 16:32:00 +03:00
Ali Parlakci
280147282b 27 jan update 2019-01-27 16:06:31 +03:00
Ali Parlakci
b7baf07fb5 Added instructions 2019-01-27 15:59:24 +03:00
Ali Parlakci
aece2273fb Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-08-28 16:28:29 +03:00
Ali Parlakci
f807efe4d5 Ignore space at the end of directory 2018-08-28 16:27:29 +03:00
Ali Parlakci
743d887927 Ignore space at the end of directory 2018-08-28 16:24:14 +03:00
Ali Parlakci
da5492858c Add bs4 2018-08-28 16:15:22 +03:00
Ali Parlakci
cebfc713d2 Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-08-28 16:12:01 +03:00
Ali Parlakci
f522154214 Update version 2018-08-28 16:11:48 +03:00
Ali Parlakçı
27cd3ee991 Changed getting gfycat links' algorithm 2018-08-28 16:10:15 +03:00
Ali Parlakci
29873331e6 Typo fix 2018-08-23 16:41:07 +03:00
Ali Parlakci
8a3dcd68a3 Update version 2018-08-23 12:16:31 +03:00
Ali Parlakci
ac323f2abe Bug fix 2018-08-23 12:09:56 +03:00
Ali Parlakci
32d26fa956 Print out github link at start 2018-08-20 15:13:42 +03:00
Ali Parlakci
137481cf3e Print out program info 2018-08-18 14:51:20 +03:00
Ali Parlakci
9b63c55d3e Print out version info before starting 2018-08-17 21:25:01 +03:00
Ali Parlakci
3a6954c7d3 Update version 2018-08-16 19:55:45 +03:00
Ali Parlakci
9a59da0c5f Update changelog 2018-08-16 19:53:33 +03:00
Ali Parlakci
d56efed1c6 Fix imgur download malfunction caused by headers 2018-08-16 19:51:56 +03:00
Ali Parlakci
8f64e62293 Update version 2018-08-15 21:52:54 +03:00
Ali Parlakci
bdc43eb0d8 Update changelog 2018-08-15 21:48:05 +03:00
Ali Parlakci
adccd8f3ba Prints the file that already exists 2018-08-15 21:46:27 +03:00
Ali Parlakci
47a07be1c8 Deleted commented line 2018-08-13 16:00:21 +03:00
Ali Parlakci
1a41dc6061 Update changelog 2018-08-13 15:56:37 +03:00
Ali Parlakci
50cb7c15b9 Fixed console prints for Linux 2018-08-13 15:55:37 +03:00
Ali Parlakci
a1f1915d57 Update changelog 2018-08-13 14:52:43 +03:00
Ali Parlakci
3448ba15a9 Added config file location as current directory 2018-08-13 14:50:45 +03:00
Ali Parlakci
ff68b5f70f Improved error handling 2018-08-10 19:50:52 +03:00
Ali Parlakci
588a3c3ea6 Update changelog 2018-08-10 13:09:28 +03:00
Ali Parlakci
8f1ff10a5e Added reddit username to config file 2018-08-10 13:08:24 +03:00
Ali Parlakci
9338961b2b Improved checkConflicts() 2018-08-09 09:26:01 +03:00
Ali Parlakci
94bc1c115f Minor edit in exception handling 2018-08-09 09:04:12 +03:00
Ali Parlakci
c19d8ad71b Refactored error handling 2018-08-09 00:17:04 +03:00
Ali Parlakci
4c8de50880 Fixed request headers 2018-08-08 00:47:34 +03:00
Ali Parlakci
3e6dfccdd2 Update request headers 2018-08-06 09:33:07 +03:00
Ali Parlakci
20b9747330 Added docstrings for the ease of modification 2018-08-06 08:13:07 +03:00
Ali Parlakci
be7508540d Refactored README page 2018-08-06 07:54:33 +03:00
Ali Parlakci
ccd9078b0a Refactored README page 2018-08-06 07:53:55 +03:00
12 changed files with 373 additions and 194 deletions

4
.gitignore vendored
View File

@@ -2,4 +2,6 @@ build/
dist/
MANIFEST
__pycache__/
src/__pycache__/
src/__pycache__/
config.json
env/

211
README.md
View File

@@ -1,9 +1,11 @@
# Bulk Downloader for Reddit
Downloads media from reddit posts.
## [Download the latest release](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
Downloads media from reddit posts. Made by [u/aliparlakci](https://reddit.com/u/aliparlakci)
## [Download the latest release here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
## What it can do
- Can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
- Sorts posts by hot, top, new and so on
- Downloads **REDDIT** images and videos, **IMGUR** images and albums, **GFYCAT** links, **EROME** images and albums, **SELF POSTS** and any link to a **DIRECT IMAGE**
@@ -13,30 +15,127 @@ Downloads media from reddit posts.
- Saves a reusable copy of posts' details that are found so that they can be re-downloaded again
- Logs failed ones in a file to so that you can try to download them later
## How it works
- For **Windows** and **Linux** users, there are executable files to run easily without installing a third party program. But if you are a paranoid like me, you can **[compile it from source code](docs/COMPILE_FROM_SOURCE.md)**.
- **MacOS** users have to **[compile it from source code](docs/COMPILE_FROM_SOURCE.md)**.
## Installation
You can use it either as a `bulk-downloader-for-reddit.exe` executable file for Windows, as a Linux binary or as a *[Python script](#python-script)*. There is no MacOS executable, MacOS users must use the Python script option.
### Executables
For Windows and Linux, [download the latest executables, here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest).
### Python script
* Download this repository ([latest zip](https://github.com/aliparlakci/bulk-downloader-for-reddit/archive/master.zip) or `git clone git@github.com:aliparlakci/bulk-downloader-for-reddit.git`).
* Enter its folder.
* Run `python ./script.py` from the command-line (Windows, MacOSX or Linux command line; it may work with Anaconda prompt) See [here](docs/INTERPRET_FROM_SOURCE.md#finding-the-correct-keyword-for-python) if you have any trouble with this step.
It uses Python 3.6 and above. It won't work with Python 3.5 or any Python 2.x. If you have a trouble setting it up, see [here](docs/INTERPRET_FROM_SOURCE.md).
### Setting up the script
You need to create an imgur developer app in order API to work. Go to https://api.imgur.com/oauth2/addclient and fill the form (It does not really matter how you fill it).
It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**.
When you run it for the first time, it will automatically create `config.json` file containing `imgur_client_id`, `imgur_client_secret`, `reddit_username` and `reddit_refresh_token`.
## Running
You can run it it an interactive mode, or using [command-line arguments](docs/COMMAND_LINE_ARGUMENTS.md) (also available via `python ./script.py --help` or `bulk-downloader-for-reddit.exe --help`).
To run the interactive mode, simply use `python ./script.py` or double click on `bulk-downloader-for-reddit.exe` without any extra commands.
### [Example for command line arguments](docs/COMMAND_LINE_ARGUMENTS.md#examples)
### Example for an interactive script
```
(py37) bulk-downloader-for-reddit user$ python ./script.py
Bulk Downloader for Reddit v1.6.5
Written by Ali PARLAKCI parlakciali@gmail.com
https://github.com/aliparlakci/bulk-downloader-for-reddit/
download directory: downloads/dataisbeautiful_last_few
select program mode:
[1] search
[2] subreddit
[3] multireddit
[4] submitted
[5] upvoted
[6] saved
[7] log
[0] exit
> 2
(type frontpage for all subscribed subreddits,
use plus to seperate multi subreddits: pics+funny+me_irl etc.)
subreddit: dataisbeautiful
select sort type:
[1] hot
[2] top
[3] new
[4] rising
[5] controversial
[0] exit
> 1
limit (0 for none): 50
GETTING POSTS
(1/24) r/dataisbeautiful
AutoModerator_[Battle]_DataViz_Battle_for_the_month_of_April_2019__Visualize_the_April_Fool's_Prank_for_2019-04-01_on__r_DataIsBeautiful_b8ws37.md
Downloaded
(2/24) r/dataisbeautiful
AutoModerator_[Topic][Open]_Open_Discussion_Monday_—_Anybody_can_post_a_general_visualization_question_or_start_a_fresh_discussion!_bg1wej.md
Downloaded
...
Total of 24 links downloaded!
Press enter to quit
```
## Additional options
Script also accepts additional options via command-line arguments. Get further information from **[`--help`](docs/COMMAND_LINE_ARGUMENTS.md)**
## Setting up the script
You need to create an imgur developer app in order API to work. Go to https://api.imgur.com/oauth2/addclient and fill the form (It does not really matter how you fill it). It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**.
## FAQ
### I am running the script on a headless machine or on a remote server. How can I authenticate my reddit account?
- Download the script on your everday computer and run it for once.
- Authenticate the program on both reddit and imgur.
- Go to your Home folder (for Windows users it is `C:\Users\[USERNAME]\`, for Linux users it is `/home/[USERNAME]`)
- Copy the *config.json* file inside the Bulk Downloader for Reddit folder and paste it **next to** the file that you run the program.
### How can I change my credentials?
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit them, there.
Also if you already have a config.json file, you can paste it **next to** the script and override the one on your Home directory.
### What do the dots resemble when getting posts?
- Each dot means that 100 posts are scanned.
### Getting posts is taking too long.
- You can press Ctrl+C to interrupt it and start downloading.
### How are filenames formatted?
- Self posts and images that are not belong to an album are formatted as **`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`**.
You can use *reddit id* to go to post's reddit page by going to link **reddit.com/[REDDIT ID]**
- An image in an imgur album is formatted as **`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`**
Similarly, you can use *imgur id* to go to image's imgur page by going to link **imgur.com/[IMGUR ID]**.
- Each dot means that 100 posts are scanned.
### Getting posts takes too long.
- You can press *Ctrl+C* to interrupt it and start downloading.
### How are the filenames formatted?
- **Self posts** and **images** that do not belong to an album and **album folders** are formatted as:
`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`
You can use *reddit id* to go to post's reddit page by going to link reddit.com/[REDDIT ID]
- An **image in an album** is formatted as:
`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`
Similarly, you can use *imgur id* to go to image's imgur page by going to link imgur.com/[IMGUR ID].
### How do I open self post files?
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
@@ -44,70 +143,6 @@ Script also accepts additional options via command-line arguments. Get further i
However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
### How can I change my credentials?
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit
them, there.
## Changelog
## Changes on *master*
### [06/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/210238d0865febcb57fbd9f0b0a7d3da9dbff384)
- Sending headers when requesting a file in order not to be rejected by server
### [04/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/426089d0f35212148caff0082708a87017757bde)
- Disabled printing post types to console
### [30/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/af294929510f884d92b25eaa855c29fc4fb6dcaa)
- Now opens web browser and goes to Imgur when prompts for Imgur credentials
### [26/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Improved verbose mode
- Minimalized the console output
- Added quit option for auto quitting the program after process finishes
### [25/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Added verbose mode
- Stylized the console output
### [24/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7a68ff3efac9939f9574c2cef6184b92edb135f4)
- Added OP's name to file names (backwards compatible)
- Deleted # char from file names (backwards compatible)
- Improved exception handling
### [23/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7314e17125aa78fd4e6b28e26fda7ec7db7e0147)
- Splited download() function
- Added erome support
- Removed exclude feature
- Bug fixes
### [22/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/6e7463005051026ad64006a8580b0b5dc9536b8c)
- Put log files in a folder named "LOG_FILES"
- Fixed the bug that makes multireddit mode unusable
### [21/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/4a8c2377f9fb4d60ed7eeb8d50aaf9a26492462a)
- Added exclude mode
### [20/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7548a010198fb693841ca03654d2c9bdf5742139)
- "0" input for no limit
- Fixed the bug that recognizes none image direct links as image links
### [19/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/41cbb58db34f500a8a5ecc3ac4375bf6c3b275bb)
- Added v.redd.it support
- Added custom exception descriptions to FAILED.json file
- Fixed the bug that prevents downloading some gfycat URLs
### [13/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/9f831e1b784a770c82252e909462871401a05c11)
- Changed config.json file's path to home directory
### [12/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/50a77f6ba54c24f5647d5ea4e177400b71ff04a7)
- Added binaries for Windows and Linux
- Wait on KeyboardInterrupt
- Accept multiple subreddit input
- Fixed the bug that prevents choosing "[0] exit" with typing "exit"
### [11/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/a28a7776ab826dea2a8d93873a94cd46db3a339b)
- Improvements on UX and UI
- Added logging errors to CONSOLE_LOG.txt
- Using current directory if directory has not been given yet.
### [10/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/ffe3839aee6dc1a552d95154d817aefc2b66af81)
- Added support for *self* post
- Now getting posts is quicker
* [See the changes on *master* here](docs/CHANGELOG.md)

86
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,86 @@
# Changes on *master*
## [23/02/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/4d385fda60028343be816eb7c4f7bc613a9d555d)
- Fixed v.redd.it links
## [27/01/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/b7baf07fb5998368d87e3c4c36aed40daf820609)
- Clarified the instructions
## [28/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
- Adjusted algorith used for extracting gfycat links because of gfycat's design change
- Ignore space at the end of the given directory
## [16/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
- Fix the bug that prevents downloading imgur videos
## [15/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/adccd8f3ba03ad124d58643d78dab287a4123a6f)
- Prints out the title of posts' that are already downloaded
## [13/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/50cb7c15b9cb4befce0cfa2c23ab5de4af9176c6)
- Added alternative location of current directory for config file
- Fixed console prints on Linux
## [10/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/8f1ff10a5e11464575284210dbba4a0d387bc1c3)
- Added reddit username to config file
## [06/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/210238d0865febcb57fbd9f0b0a7d3da9dbff384)
- Sending headers when requesting a file in order not to be rejected by server
## [04/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/426089d0f35212148caff0082708a87017757bde)
- Disabled printing post types to console
## [30/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/af294929510f884d92b25eaa855c29fc4fb6dcaa)
- Now opens web browser and goes to Imgur when prompts for Imgur credentials
## [26/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Improved verbose mode
- Minimalized the console output
- Added quit option for auto quitting the program after process finishes
## [25/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Added verbose mode
- Stylized the console output
## [24/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7a68ff3efac9939f9574c2cef6184b92edb135f4)
- Added OP's name to file names (backwards compatible)
- Deleted # char from file names (backwards compatible)
- Improved exception handling
## [23/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7314e17125aa78fd4e6b28e26fda7ec7db7e0147)
- Splited download() function
- Added erome support
- Removed exclude feature
- Bug fixes
## [22/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/6e7463005051026ad64006a8580b0b5dc9536b8c)
- Put log files in a folder named "LOG_FILES"
- Fixed the bug that makes multireddit mode unusable
## [21/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/4a8c2377f9fb4d60ed7eeb8d50aaf9a26492462a)
- Added exclude mode
## [20/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7548a010198fb693841ca03654d2c9bdf5742139)
- "0" input for no limit
- Fixed the bug that recognizes none image direct links as image links
## [19/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/41cbb58db34f500a8a5ecc3ac4375bf6c3b275bb)
- Added v.redd.it support
- Added custom exception descriptions to FAILED.json file
- Fixed the bug that prevents downloading some gfycat URLs
## [13/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/9f831e1b784a770c82252e909462871401a05c11)
- Changed config.json file's path to home directory
## [12/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/50a77f6ba54c24f5647d5ea4e177400b71ff04a7)
- Added binaries for Windows and Linux
- Wait on KeyboardInterrupt
- Accept multiple subreddit input
- Fixed the bug that prevents choosing "[0] exit" with typing "exit"
## [11/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/a28a7776ab826dea2a8d93873a94cd46db3a339b)
- Improvements on UX and UI
- Added logging errors to CONSOLE_LOG.txt
- Using current directory if directory has not been given yet.
## [10/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/ffe3839aee6dc1a552d95154d817aefc2b66af81)
- Added support for *self* post
- Now getting posts is quicker

View File

@@ -1,6 +1,6 @@
# Using command-line arguments
See **[compiling from source](COMPILE_FROM_SOURCE.md)** page first unless you are using an executable file. If you are using an executable file, see [using terminal](COMPILE_FROM_SOURCE.md#using-terminal) and come back.
See **[compiling from source](INTERPRET_FROM_SOURCE.md)** page first unless you are using an executable file. If you are using an executable file, see [using terminal](INTERPRET_FROM_SOURCE.md#using-terminal) and come back.
***Use*** `.\bulk-downloader-for-reddit.exe` ***or*** `./bulk-downloader-for-reddit` ***if you are using the executable***.
```console
@@ -98,4 +98,4 @@ python script.py --directory C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER
# FAQ
## I can't startup the script no matter what.
See **[finding the correct keyword for Python](COMPILE_FROM_SOURCE.md#finding-the-correct-keyword-for-python)**
See **[finding the correct keyword for Python](INTERPRET_FROM_SOURCE.md#finding-the-correct-keyword-for-python)**

View File

@@ -1,16 +1,16 @@
# Compiling from source code
# Interpret from source code
## Requirements
### Python 3 Interpreter
Latest* version of **Python 3** is needed. See if it is already installed [here](#finding-the-correct-keyword-for-python). If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
\* *Use Python 3.6.5 if you encounter an issue*
- This program is designed to work best on **Python 3.6.5** and this version of Python 3 is suggested. See if it is already installed, [here](#finding-the-correct-keyword-for-python).
- If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
## Using terminal
### To open it...
- **On Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**
- **on Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**
- **On Linux**: Right-click in a folder and select **Open Terminal** or press **Ctrl+Alt+T**.
- **on Linux**: Right-click in a folder and select **Open Terminal** or press **Ctrl+Alt+T**.
- **On MacOS**: Look for an app called **Terminal**.
- **on MacOS**: Look for an app called **Terminal**.
### Navigating to the directory where script is downloaded
Go inside the folder where script.py is located. If you are not familiar with changing directories on command-prompt and terminal read *Changing Directories* in [this article](https://lifehacker.com/5633909/who-needs-a-mouse-learn-to-use-the-command-line-for-almost-anything)

View File

@@ -1,3 +1,4 @@
bs4
requests
praw
imgurpython

108
script.py
View File

@@ -23,7 +23,7 @@ from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
__author__ = "Ali Parlakci"
__license__ = "GPL"
__version__ = "1.6.1"
__version__ = "1.6.5"
__maintainer__ = "Ali Parlakci"
__email__ = "parlakciali@gmail.com"
@@ -184,9 +184,10 @@ def checkConflicts():
else:
user = 1
search = 1 if GLOBAL.arguments.search else 0
modes = [
"saved","subreddit","submitted","search","log","link","upvoted",
"multireddit"
"saved","subreddit","submitted","log","link","upvoted","multireddit"
]
values = {
@@ -199,15 +200,18 @@ def checkConflicts():
if not sum(values[x] for x in values) == 1:
raise ProgramModeError("Invalid program mode")
if values["search"]+values["saved"] == 2:
if search+values["saved"] == 2:
raise SearchModeError("You cannot search in your saved posts")
if values["search"]+values["submitted"] == 2:
if search+values["submitted"] == 2:
raise SearchModeError("You cannot search in submitted posts")
if values["search"]+values["upvoted"] == 2:
if search+values["upvoted"] == 2:
raise SearchModeError("You cannot search in upvoted posts")
if search+values["log"] == 2:
raise SearchModeError("You cannot search in log files")
if values["upvoted"]+values["submitted"] == 1 and user == 0:
raise RedditorNameError("No redditor name given")
@@ -261,18 +265,22 @@ class PromptUser:
if programMode == "subreddit":
subredditInput = input("subreddit (enter frontpage for frontpage): ")
subredditInput = input("(type frontpage for all subscribed subreddits,\n" \
" use plus to seperate multi subreddits:" \
" pics+funny+me_irl etc.)\n\n" \
"subreddit: ")
GLOBAL.arguments.subreddit = subredditInput
while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
subredditInput = input("subreddit: ")
GLOBAL.arguments.subreddit += "+" + subredditInput
# while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
# subredditInput = input("subreddit: ")
# GLOBAL.arguments.subreddit += "+" + subredditInput
if " " in GLOBAL.arguments.subreddit:
GLOBAL.arguments.subreddit = "+".join(GLOBAL.arguments.subreddit.split())
# DELETE THE PLUS (+) AT THE END
if not subredditInput.lower() == "frontpage":
if not subredditInput.lower() == "frontpage" \
and GLOBAL.arguments.subreddit[-1] == "+":
GLOBAL.arguments.subreddit = GLOBAL.arguments.subreddit[:-1]
print("\nselect sort type:")
@@ -293,7 +301,7 @@ class PromptUser:
GLOBAL.arguments.time = "all"
elif programMode == "multireddit":
GLOBAL.arguments.user = input("\nredditor: ")
GLOBAL.arguments.user = input("\nmultireddit owner: ")
GLOBAL.arguments.multireddit = input("\nmultireddit: ")
print("\nselect sort type:")
@@ -385,10 +393,7 @@ def prepareAttributes():
GLOBAL.arguments.link = GLOBAL.arguments.link.strip("\"")
try:
ATTRIBUTES = LinkDesigner(GLOBAL.arguments.link)
except InvalidRedditLink:
raise InvalidRedditLink
ATTRIBUTES = LinkDesigner(GLOBAL.arguments.link)
if GLOBAL.arguments.search is not None:
ATTRIBUTES["search"] = GLOBAL.arguments.search
@@ -418,7 +423,7 @@ def prepareAttributes():
ATTRIBUTES["submitted"] = True
if GLOBAL.arguments.sort == "rising":
raise InvalidSortingType
raise InvalidSortingType("Invalid sorting type has given")
ATTRIBUTES["limit"] = GLOBAL.arguments.limit
@@ -455,6 +460,9 @@ def isPostExists(POST):
possibleExtensions = [".jpg",".png",".mp4",".gif",".webm",".md"]
"""If you change the filenames, don't forget to add them here.
Please don't remove existing ones
"""
for extension in possibleExtensions:
OLD_FILE_PATH = PATH / (
@@ -481,6 +489,8 @@ def isPostExists(POST):
return False
def downloadPost(SUBMISSION):
"""Download directory is declared here for each file"""
directory = GLOBAL.directory / SUBMISSION['postSubreddit']
global lastRequestTime
@@ -563,7 +573,10 @@ def download(submissions):
print(f" {submissions[i]['postType'].upper()}",end="",noPrint=True)
if isPostExists(submissions[i]):
print("\nIt already exists")
print(f"\n" \
f"{submissions[i]['postSubmitter']}_"
f"{nameCorrector(submissions[i]['postTitle'])}")
print("It already exists")
duplicates += 1
downloadedCount -= 1
continue
@@ -626,60 +639,61 @@ def download(submissions):
downloadedCount -= 1
if duplicates:
print("\n There was {} duplicates".format(duplicates))
print(f"\nThere {'were' if duplicates > 1 else 'was'} " \
f"{duplicates} duplicate{'s' if duplicates > 1 else ''}")
if downloadedCount == 0:
print(" Nothing downloaded :(")
print("Nothing downloaded :(")
else:
print(" Total of {} links downloaded!".format(downloadedCount))
print(f"Total of {downloadedCount} " \
f"link{'s' if downloadedCount > 1 else ''} downloaded!")
def main():
VanillaPrint(
f"\nBulk Downloader for Reddit v{__version__}\n" \
f"Written by Ali PARLAKCI parlakciali@gmail.com\n\n" \
f"https://github.com/aliparlakci/bulk-downloader-for-reddit/"
)
GLOBAL.arguments = parseArguments()
if GLOBAL.arguments.directory is not None:
GLOBAL.directory = Path(GLOBAL.arguments.directory)
GLOBAL.directory = Path(GLOBAL.arguments.directory.strip())
else:
GLOBAL.directory = Path(input("download directory: "))
GLOBAL.directory = Path(input("\ndownload directory: ").strip())
print("\n"," ".join(sys.argv),"\n",noPrint=True)
print(f"Bulk Downloader for Reddit v{__version__}\n",noPrint=True
)
try:
checkConflicts()
except ProgramModeError as err:
PromptUser()
if not Path(GLOBAL.configDirectory).is_dir():
os.makedirs(GLOBAL.configDirectory)
GLOBAL.config = getConfig(GLOBAL.configDirectory / "config.json")
if not Path(GLOBAL.defaultConfigDirectory).is_dir():
os.makedirs(GLOBAL.defaultConfigDirectory)
if Path("config.json").exists():
GLOBAL.configDirectory = Path("config.json")
else:
GLOBAL.configDirectory = GLOBAL.defaultConfigDirectory / "config.json"
GLOBAL.config = getConfig(GLOBAL.configDirectory)
if GLOBAL.arguments.log is not None:
logDir = Path(GLOBAL.arguments.log)
download(postFromLog(logDir))
sys.exit()
try:
POSTS = getPosts(prepareAttributes())
except InsufficientPermission:
print("You do not have permission to do that")
sys.exit()
except NoMatchingSubmissionFound:
print("No matching submission was found")
sys.exit()
except NoRedditSupoort:
print("Reddit does not support that")
sys.exit()
except NoPrawSupport:
print("PRAW does not support that")
sys.exit()
except MultiredditNotFound:
print("Multireddit not found")
sys.exit()
except InvalidSortingType:
print("Invalid sorting type has given")
sys.exit()
except InvalidRedditLink:
print("Invalid reddit link")
except Exception as exc:
logging.error(sys.exc_info()[0].__name__,
exc_info=full_exc_info(sys.exc_info()))
print(log_stream.getvalue(),noPrint=True)
print(exc)
sys.exit()
if POSTS is None:

View File

@@ -1,13 +1,15 @@
import io
import json
import os
import sys
import urllib.request
from html.parser import HTMLParser
from multiprocessing import Queue
from pathlib import Path
from urllib.error import HTTPError
import imgurpython
from multiprocessing import Queue
from bs4 import BeautifulSoup
from src.errors import (AlbumNotDownloadedCompletely, FileAlreadyExistsError,
FileNameTooLong, ImgurLoginError,
@@ -23,8 +25,7 @@ def dlProgress(count, blockSize, totalSize):
downloadedMbs = int(count*blockSize*(10**(-6)))
fileSize = int(totalSize*(10**(-6)))
sys.stdout.write("\r{}Mb/{}Mb".format(downloadedMbs,fileSize))
sys.stdout.write("\b"*len("\r{}Mb/{}Mb".format(downloadedMbs,fileSize)))
sys.stdout.write("{}Mb/{}Mb\r".format(downloadedMbs,fileSize))
sys.stdout.flush()
def getExtension(link):
@@ -55,10 +56,11 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
"""
headers = [
("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 " \
"(KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"),
("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 "\
"Safari/537.36 OPR/54.0.2952.64"),
("Accept", "text/html,application/xhtml+xml,application/xml;" \
"q=0.9,*/*;q=0.8"),
"q=0.9,image/webp,image/apng,*/*;q=0.8"),
("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3"),
("Accept-Encoding", "none"),
("Accept-Language", "en-US,en;q=0.8"),
@@ -66,7 +68,8 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
]
opener = urllib.request.build_opener()
opener.addheaders = headers
if not "imgur" in imageURL:
opener.addheaders = headers
urllib.request.install_opener(opener)
if not (os.path.isfile(fileDir)):
@@ -102,6 +105,8 @@ class Erome:
extension = getExtension(IMAGES[0])
"""Filenames are declared here"""
title = nameCorrector(post['postTitle'])
print(post["postSubmitter"]+"_"+title+"_"+post['postId']+extension)
@@ -112,7 +117,9 @@ class Erome:
post["postSubmitter"]+"_"+title+"_"+post['postId']+".tmp"
)
imageURL = "https:" + IMAGES[0]
imageURL = IMAGES[0]
if 'https://' not in imageURL and 'http://' not in imageURL:
imageURL = "https://" + imageURL
try:
getFile(fileDir,tempDir,imageURL)
@@ -141,7 +148,9 @@ class Erome:
extension = getExtension(IMAGES[i])
fileName = str(i+1)
imageURL = "https:" + IMAGES[i]
imageURL = IMAGES[i]
if 'https://' not in imageURL and 'http://' not in imageURL:
imageURL = "https://" + imageURL
fileDir = folderDir / (fileName + extension)
tempDir = folderDir / (fileName + ".tmp")
@@ -237,8 +246,11 @@ class Imgur:
post['mediaURL'] = content['object'].link
post['postExt'] = getExtension(post['mediaURL'])
title = nameCorrector(post['postTitle'])
"""Filenames are declared here"""
print(post["postSubmitter"]+"_"+title+"_"+post['postId']+post['postExt'])
fileDir = directory / (
@@ -297,6 +309,8 @@ class Imgur:
+ "_"
+ images[i]['id'])
"""Filenames are declared here"""
fileDir = folderDir / (fileName + images[i]['Ext'])
tempDir = folderDir / (fileName + ".tmp")
@@ -393,12 +407,17 @@ class Gfycat:
except IndexError:
raise NotADownloadableLinkError("Could not read the page source")
except Exception as exception:
#debug
raise exception
raise NotADownloadableLinkError("Could not read the page source")
POST['postExt'] = getExtension(POST['mediaURL'])
if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(POST['postTitle'])
"""Filenames are declared here"""
print(POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt'])
fileDir = directory / (
@@ -429,30 +448,25 @@ class Gfycat:
url = "https://gfycat.com/" + url.split('/')[-1]
pageSource = (urllib.request.urlopen(url).read().decode().split('\n'))
pageSource = (urllib.request.urlopen(url).read().decode())
theLine = pageSource[lineNumber]
lenght = len(query)
link = []
soup = BeautifulSoup(pageSource, "html.parser")
attributes = {"data-react-helmet":"true","type":"application/ld+json"}
content = soup.find("script",attrs=attributes)
for i in range(len(theLine)):
if theLine[i:i+lenght] == query:
cursor = (i+lenght)+1
while not theLine[cursor] == '"':
link.append(theLine[cursor])
cursor += 1
break
if "".join(link) == "":
if content is None:
raise NotADownloadableLinkError("Could not read the page source")
return "".join(link)
return json.loads(content.text)["video"]["contentUrl"]
class Direct:
def __init__(self,directory,POST):
POST['postExt'] = getExtension(POST['postURL'])
if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(POST['postTitle'])
"""Filenames are declared here"""
print(POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt'])
fileDir = directory / (
@@ -475,6 +489,9 @@ class Self:
if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(post['postTitle'])
"""Filenames are declared here"""
print(post["postSubmitter"]+"_"+title+"_"+post['postId']+".md")
fileDir = directory / (
@@ -494,7 +511,8 @@ class Self:
@staticmethod
def writeToFile(directory,post):
"""Self posts are formatted here"""
content = ("## ["
+ post["postTitle"]
+ "]("

View File

@@ -67,7 +67,7 @@ class NoMatchingSubmissionFound(Exception):
class NoPrawSupport(Exception):
pass
class NoRedditSupoort(Exception):
class NoRedditSupport(Exception):
pass
class MultiredditNotFound(Exception):

View File

@@ -29,7 +29,7 @@ def LinkParser(LINK):
ShortLink = False
if not "reddit.com" in LINK:
raise InvalidRedditLink
raise InvalidRedditLink("Invalid reddit link")
SplittedLink = LINK.split("/")

View File

@@ -3,13 +3,15 @@ import sys
import random
import socket
import webbrowser
import urllib.request
from urllib.error import HTTPError
import praw
from prawcore.exceptions import NotFound, ResponseException, Forbidden
from src.tools import GLOBAL, createLogFile, jsonFile, printToFile
from src.errors import (NoMatchingSubmissionFound, NoPrawSupport,
NoRedditSupoort, MultiredditNotFound,
NoRedditSupport, MultiredditNotFound,
InvalidSortingType, RedditLoginFailed,
InsufficientPermission)
@@ -48,6 +50,7 @@ def beginPraw(config,user_agent = str(socket.gethostname())):
self.client = self.recieve_connection()
data = self.client.recv(1024).decode('utf-8')
str(data)
param_tokens = data.split(' ', 2)[1].split('?', 1)[1].split('&')
params = {
key: value for (key, value) in [token.split('=') \
@@ -92,7 +95,8 @@ def beginPraw(config,user_agent = str(socket.gethostname())):
authorizedInstance = GetAuth(reddit,port).getRefreshToken(*scopes)
reddit = authorizedInstance[0]
refresh_token = authorizedInstance[1]
jsonFile(GLOBAL.configDirectory / "config.json").add({
jsonFile(GLOBAL.configDirectory).add({
"reddit_username":str(reddit.user.me()),
"reddit_refresh_token":refresh_token
})
else:
@@ -101,7 +105,8 @@ def beginPraw(config,user_agent = str(socket.gethostname())):
authorizedInstance = GetAuth(reddit,port).getRefreshToken(*scopes)
reddit = authorizedInstance[0]
refresh_token = authorizedInstance[1]
jsonFile(GLOBAL.configDirectory / "config.json").add({
jsonFile(GLOBAL.configDirectory).add({
"reddit_username":str(reddit.user.me()),
"reddit_refresh_token":refresh_token
})
return reddit
@@ -115,7 +120,7 @@ def getPosts(args):
reddit = beginPraw(config)
if args["sort"] == "best":
raise NoPrawSupport
raise NoPrawSupport("PRAW does not support that")
if "subreddit" in args:
if "search" in args:
@@ -144,8 +149,8 @@ def getPosts(args):
}
if "search" in args:
if args["sort"] in ["hot","rising","controversial"]:
raise InvalidSortingType
if GLOBAL.arguments.sort in ["hot","rising","controversial"]:
raise InvalidSortingType("Invalid sorting type has given")
if "subreddit" in args:
print (
@@ -169,16 +174,16 @@ def getPosts(args):
)
elif "multireddit" in args:
raise NoPrawSupport
raise NoPrawSupport("PRAW does not support that")
elif "user" in args:
raise NoPrawSupport
raise NoPrawSupport("PRAW does not support that")
elif "saved" in args:
raise NoRedditSupoort
raise ("Reddit does not support that")
if args["sort"] == "relevance":
raise InvalidSortingType
raise InvalidSortingType("Invalid sorting type has given")
if "saved" in args:
print(
@@ -243,7 +248,7 @@ def getPosts(args):
) (**keyword_params)
)
except NotFound:
raise MultiredditNotFound
raise MultiredditNotFound("Multireddit not found")
elif "submitted" in args:
print (
@@ -273,7 +278,7 @@ def getPosts(args):
reddit.redditor(args["user"]).upvoted(limit=args["limit"])
)
except Forbidden:
raise InsufficientPermission
raise InsufficientPermission("You do not have permission to do that")
elif "post" in args:
print("post: {post}\n".format(post=args["post"]).upper(),noPrint=True)
@@ -385,7 +390,7 @@ def redditSearcher(posts,SINGLE_POST=False):
print()
return subList
else:
raise NoMatchingSubmissionFound
raise NoMatchingSubmissionFound("No matching submission was found")
def checkIfMatching(submission):
global gfycatCount
@@ -419,18 +424,20 @@ def checkIfMatching(submission):
eromeCount += 1
return details
elif isDirectLink(submission.url) is not False:
details['postType'] = 'direct'
details['postURL'] = isDirectLink(submission.url)
directCount += 1
return details
elif submission.is_self:
details['postType'] = 'self'
details['postContent'] = submission.selftext
selfCount += 1
return details
directLink = isDirectLink(submission.url)
if directLink is not False:
details['postType'] = 'direct'
details['postURL'] = directLink
directCount += 1
return details
def printSubmission(SUB,validNumber,totalNumber):
"""Print post's link, title and media link to screen"""
@@ -470,7 +477,22 @@ def isDirectLink(URL):
return URL
elif "v.redd.it" in URL:
return URL+"/DASH_600_K"
bitrates = ["DASH_1080","DASH_720","DASH_600", \
"DASH_480","DASH_360","DASH_240"]
for bitrate in bitrates:
videoURL = URL+"/"+bitrate
try:
responseCode = urllib.request.urlopen(videoURL).getcode()
except urllib.error.HTTPError:
responseCode = 0
if responseCode == 200:
return videoURL
else:
return False
for extension in imageTypes:
if extension in URL:

View File

@@ -14,7 +14,8 @@ class GLOBAL:
config = None
arguments = None
directory = None
configDirectory = Path.home() / "Bulk Downloader for Reddit"
defaultConfigDirectory = Path.home() / "Bulk Downloader for Reddit"
configDirectory = ""
reddit_client_id = "BSyphDdxYZAgVQ"
reddit_client_secret = "bfqNJaRh8NMh-9eAr-t4TRz-Blk"
printVanilla = print