Automate Your Research: 1
Published:
A Scientific Sink
Almost everyone I know reach research papers in their computers. Even though most papers are online, if you are like me, you ‘d probably love reading the pdf. It is neat, the figures are right around the text (I don’t like a paper where I have to turn a page to read legends :)). There are many software tools that
Move scientific publications to appropriate folder
Step 1. Write a script: off course
#!/usr/bin/env python3
import os
import re
import shutil
from PyPDF2 import PdfReader
# Setup
downloads = os.path.expanduser("~/Downloads")
target = os.path.join(downloads, "Journals_downloaded")
os.makedirs(target, exist_ok=True)
# DOI patterns
doi_patterns = [
re.compile(r'\b10\.\d{4,9}/[-._;()/:A-Z0-9]+\b', re.IGNORECASE),
re.compile(r'\bdoi:\s*10\.\d{4,9}/[-._;()/:A-Z0-9]+\b', re.IGNORECASE)
]
def contains_doi(text):
return any(p.search(text or '') for p in doi_patterns)
# Process PDFs
for file in os.listdir(downloads):
if file.lower().endswith(".pdf"):
path = os.path.join(downloads, file)
try:
reader = PdfReader(path)
texts = [p.extract_text() for p in reader.pages[:2]]
if any(contains_doi(t) for t in texts):
print(f"DOI found in: {file}")
shutil.move(path, os.path.join(target, file))
else:
print(f"No DOI in: {file}")
except Exception as e:
print(f"Failed to process {file}: {e}")
Step 2. Run the script periodically
Save your script as e.g., move_pdfs_with_doi.py
Make the script executable
chmod +x /path/to/move_pdfs_with_doi.py
Open crontab. Crontab (short for “cron table”) is a time-based job scheduler in Unix-like operating systems, including macOS and Linux. It allows you to automatically run commands or scripts at specified times and intervals — daily, weekly, monthly, or even every minute.
crontab -e
Schedule the script to run every Friday at 3.00PM 0 15 * * 5 /usr/bin/python3 /path/to/move_pdfs_with_doi.py