Introduction
Web novels are gaining popularity as an entertaining form of literature available across various platforms. Managing these novels for offline reading can be tricky, but not impossible. This blog post will guide you through scraping web novel data from platforms and converting it into formats like ePUB, PDF, and DOCX for seamless offline reading.
Step 1: Install Required Libraries
First, ensure you have Python installed. Then, install the necessary libraries with the following commands:
pip install requests
pip install beautifulsoup4
pip install ebooklib
pip install fpdf
pip install python-docx
Step 2: Scrape Web Novels Data
Below is the code to scrape web novel data from a hypothetical platform:
import requests
from bs4 import BeautifulSoup
# URL of the web novel platform
url = 'https://www.examplewebnovelplatform.com'
# Send a GET request to the URL
response = requests.get(url)
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Find and extract the desired data (e.g., novel titles, authors, summaries)
novels = soup.find_all('div', class_='novel-item')
for novel in novels:
title = novel.find('h2', class_='novel-title').text
author = novel.find('span', class_='novel-author').text
summary = novel.find('p', class_='novel-summary').text
print(f'Title: {title}')
print(f'Author: {author}')
print(f'Summary: {summary}')
print('---')
Step 3: Convert Data to Desired Formats
Convert to ePUB:
from ebooklib import epub
# Create an ePUB book
book = epub.EpubBook()
# Add metadata
book.set_title('Sample Web Novel')
book.set_language('en')
# Create a chapter
chapter = epub.EpubHtml(title='Chapter 1', file_name='chap_01.xhtml', lang='en')
chapter.content = '<h1>Chapter 1</h1><p>This is the first chapter of the sample web novel.</p>'
# Add chapter to the book
book.add_item(chapter)
# Define Table Of Contents
book.toc = (epub.Link('chap_01.xhtml', 'Chapter 1', 'chap_01'),)
# Add default NCX and Nav files
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())
# Define CSS style
style = 'BODY {color: black;}'
nav_css = epub.EpubItem(uid='style_nav', file_name='style/nav.css', media_type='text/css', content=style)
book.add_item(nav_css)
# Create the ePUB file
epub.write_epub('sample_web_novel.epub', book, {})
Convert to PDF:
from fpdf import FPDF
# Create a PDF document
pdf = FPDF()
pdf.add_page()
# Set font
pdf.set_font('Arial', size=12)
# Add a cell
pdf.cell(200, 10, txt='Sample Web Novel', ln=True, align='C')
# Add content
pdf.multi_cell(0, 10, 'This is the first chapter of the sample web novel.')
# Save the PDF
pdf.output('sample_web_novel.pdf')
Convert to DOCX:
from docx import Document
# Create a Word document
doc = Document()
# Add a heading
doc.add_heading('Sample Web Novel', 0)
# Add a paragraph
doc.add_paragraph('This is the first chapter of the sample web novel.')
# Save the document
doc.save('sample_web_novel.docx')
Conclusion
By following these steps, you can easily scrape web novel data from platforms and convert it into ePUB, PDF, and DOC formats for offline reading. This allows you to enjoy your favorite stories on various devices without needing an internet connection. Happy reading!