Cải thiện bài viết
Lưu bài viết
Cải thiện bài viết
Lưu bài viết
Đọc
Bàn luận
Chuyển đổi HTML/WebPage thành PDF
Có nhiều trang web không cho phép tải xuống nội dung dưới dạng PDF, họ yêu cầu mua phiên bản cao cấp của họ hoặc don lồng có dịch vụ tải xuống như vậy dưới dạng PDF.
$ pip install pdfkit
Chuyển đổi trong 3 bước từ trang web/HTML sang PDF
For Ubuntu/Debian:
sudo apt-get install wkhtmltopdf
Bước1: Tải xuống thư viện pdfkit
[a]Download link: WKHTMLTOPDF
[b]Set: PATH variable set binary folder in Environment
variables.
Bước2: Tải xuống wkhtmltopdffor ubuntu/debian:
[i] Already Saved HTML page
Đối với Windows: [a] Liên kết tải xuống: WKHTMLTOPDF [b] Đặt: Biến đường dẫn Đặt thư mục nhị phân trong các biến môi trường.
sudo apt-get install wkhtmltopdf3
sudo apt-get install wkhtmltopdf4
sudo apt-get install wkhtmltopdf5
sudo apt-get install wkhtmltopdf6
sudo apt-get install wkhtmltopdf7
Bước3: Mã trong Python để tải xuống: [i] đã lưu trang HTML
sudo apt-get install wkhtmltopdf1
sudo apt-get install wkhtmltopdf2
Đối với Windows: [a] Liên kết tải xuống: WKHTMLTOPDF [b] Đặt: Biến đường dẫn Đặt thư mục nhị phân trong các biến môi trường.
import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
0import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
1import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
2import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
3sudo apt-get install wkhtmltopdf7
Bước3: Mã trong Python để tải xuống: [i] đã lưu trang HTML: Your pdf file would be created and saved in the same directory where python file exists.
sudo apt-get install wkhtmltopdf1
sudo apt-get install wkhtmltopdf2
1. You can pass a list with multiple URLs or files:
import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
5import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
6sudo apt-get install wkhtmltopdf5
import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
8sudo apt-get install wkhtmltopdf5
pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
0pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
1pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
2sudo apt-get install wkhtmltopdf7
pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
4pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
5sudo apt-get install wkhtmltopdf5
pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
7pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
1sudo apt-get install wkhtmltopdf6
sudo apt-get install wkhtmltopdf7
[ii] Chuyển đổi bằng URL trang web
Ví dụ 1: Python 3.6+.
Đầu ra: γEEKS cho γeek. ....
Ví dụ 2: Python 2.6-3.3. Chúng ta có thể sử dụng htmlparser. ....
Đầu ra: γeek cho γeek ..
Làm thế nào để bạn xử lý một pdf trong python?
0m3r
2- Thư viện Python để xử lý PDF.15 gold badges31 silver badges68 bronze badges
PDFMiner. PDFMiner là một công cụ để trích xuất thông tin từ các tài liệu PDF. ....Apr 29, 2014 at 8:10
1
PYPDF2. PYPDF2 là một thư viện PDF Python thuần túy có khả năng chia tách, hợp nhất với nhau, cắt xén và chuyển đổi các trang của các tệp PDF. ....
Tôi đã tìm giải pháp để in trang web vào tệp PDF địa phương, sử dụng Python. Một trong những giải pháp tốt là sử dụng Qt, được tìm thấy ở đây, //bharatikunal.wordpress.com/2010/01/.
import pdfkit
pdfkit.from_url['//google.com', 'out.pdf']
Nó không hoạt động ngay từ đầu vì tôi gặp vấn đề với việc cài đặt PYQT4 vì nó đã đưa ra các thông báo lỗi như 'import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
1' và 'import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
1'.
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
Đó là bởi vì PYQT4 không được cài đặt đúng cách. Tôi đã từng có các thư viện được đặt tại C: \ python27 \ lib tuy nhiên nó không dành cho pyqt4.
Trên thực tế, nó chỉ cần tải xuống từ //www.riverbankcomputing.com/software/pyqt/doad [tâm trí phiên bản Python chính xác bạn đang sử dụng] và cài đặt nó vào C: \ Python27 [trường hợp của tôi]. Đó là nó.
Bây giờ các tập lệnh chạy tốt nên tôi muốn chia sẻ nó. Để biết thêm tùy chọn trong việc sử dụng QPrinter, vui lòng tham khảo //qt-project.org/doc/qt-4.8/qprinter.html#orientation-enum.
11.8K15 Huy hiệu vàng31 Huy hiệu bạc68 Huy hiệu đồng
hỏi ngày 29 tháng 4 năm 2014 lúc 8:10May 20, 2014 at 13:24
Bạn cũng có thể sử dụng pdfkit:NorthCat
Cách sử dụng16 gold badges45 silver badges49 bronze badges
17
Cài đặt
pip install weasyprint # No longer supports Python 2.x.
python
>>> import weasyprint
>>> pdf = weasyprint.HTML['//www.google.com'].write_pdf[]
>>> len[pdf]
92059
>>> open['google.pdf', 'wb'].write[pdf]
MacOS:
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
3Debian/Ubuntu:
42 gold badges17 silver badges29 bronze badgesimport time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
Windows:
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
5Dec 23, 2015 at
15:04
Xem tài liệu chính thức cho macOS/ubuntu/os khác: //github.com/jazzcore/python-pdfkit/wiki/installing-wkhtmltopdfJohnMudd
Đã trả lời ngày 20 tháng 5 năm 2014 lúc 13:242 gold badges26 silver badges24 bronze badges
10
Northcatnorthcat
9.23716 Huy hiệu vàng45 Huy hiệu bạc49 Huy hiệu đồng
//github.com/disflux/django-mtr/blob/master/pdfgen/doc_overlay.py
Dấu ấn
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
Sunit GautamApr 30, 2014 at 7:31
Mark Kmark kMark K
7.96313 Huy hiệu vàng52 Huy hiệu bạc101 Huy hiệu đồng13 gold badges52 silver badges101 bronze badges
8
Theo câu trả lời này: Cách chuyển đổi trang web thành PDF bằng cách sử dụng Python, lời khuyên là sử dụng pdfkit. Bạn cũng phải cài đặt wkhtmltopdf.pdfkit. You also have to install wkhtmltopdf.
Nếu bạn có tệp
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
6 cục bộ, sau đó bạn cần sử dụng lệnh này:pdfkit.from_file['test.html', 'out.pdf']
Nhưng điều này sẽ gây ra lỗi nếu bạn chưa thêm các tệp thực thi WKHTMLTOPDF vào đường dẫn hệ thống của mình. Đây là phần đã vấp ngã tôi và tôi muốn chia sẻ.
Trên Windows, hãy mở các biến môi trường của bạn và thêm chúng vào
import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
7> import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
url = '//www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"
app = QApplication[sys.argv]
web = QWebView[]
#Read the URL given
web.load[QUrl[url]]
printer = QPrinter[]
#setting format
printer.setPageSize[QPrinter.A4]
printer.setOrientation[QPrinter.Landscape]
printer.setOutputFormat[QPrinter.PdfFormat]
#export file as c:\tem_pdf.pdf
printer.setOutputFileName[tem_pdf]
def convertIt[]:
web.print_[printer]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
app.exec_[]
sys.exit
# Below is to add on the weblink as text and present date&time on PDF generated
outputPDF = PdfFileWriter[]
packet = StringIO.StringIO[]
# create a new PDF with Reportlab
can = canvas.Canvas[packet, pagesize=letter]
can.setFont["Helvetica", 9]
# Writting the new line
oknow = time.strftime["%a, %d %b %Y %H:%M"]
can.drawString[5, 2, url]
can.drawString[605, 2, oknow]
can.save[]
#move to the beginning of the StringIO buffer
packet.seek[0]
new_pdf = PdfFileReader[packet]
# read your existing PDF
existing_pdf = PdfFileReader[file[tem_pdf, "rb"]]
pages = existing_pdf.getNumPages[]
output = PdfFileWriter[]
# add the "watermark" [which is the new pdf] on the existing page
for x in range[0,pages]:
page = existing_pdf.getPage[x]
page.mergePage[new_pdf.getPage[0]]
output.addPage[page]
# finally, write "output" to a real file
outputStream = file[final_file, "wb"]
output.write[outputStream]
outputStream.close[]
print final_file, 'is ready.'
8 của bạn như bên dưới. Trong trường hợp của tôi, các tệp này được đặt ở đây sau khi tôi cài đặt wkhtmltopdf từ một exe:pdfkit.from_file['test.html', 'out.pdf']
0
Đã trả lời ngày 29 tháng 1 năm 2018 lúc 22:31Jan 29, 2018 at 22:31
JaradjaradJarad
15.9K19 Huy hiệu vàng90 Huy hiệu bạc144 Huy hiệu đồng19 gold badges90 silver badges144 bronze badges
1
Đây là một trong những hoạt động tốt:
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
app = QApplication[sys.argv]
web = QWebView[]
web.load[QUrl["//www.yahoo.com"]]
printer = QPrinter[]
printer.setPageSize[QPrinter.A4]
printer.setOutputFormat[QPrinter.PdfFormat]
printer.setOutputFileName["fileOK.pdf"]
def convertIt[]:
web.print_[printer]
print["Pdf generated"]
QApplication.exit[]
QObject.connect[web, SIGNAL["loadFinished[bool]"], convertIt]
sys.exit[app.exec_[]]
Fractalspace
5.3132 Huy hiệu vàng42 Huy hiệu bạc47 Huy hiệu đồng2 gold badges42 silver badges47 bronze badges
Đã trả lời ngày 29 tháng 4 năm 2014 lúc 8:11Apr 29, 2014 at 8:11
Mark Kmark kMark K
7.96313 Huy hiệu vàng52 Huy hiệu bạc101 Huy hiệu đồng13 gold badges52 silver badges101 bronze badges
2
Dưới đây là một giải pháp đơn giản sử dụng Qt. Tôi thấy đây là một phần của câu trả lời cho một câu hỏi khác về Stackoverflow. Tôi đã kiểm tra nó trên Windows.
from PyQt4.QtGui import QTextDocument, QPrinter, QApplication
import sys
app = QApplication[sys.argv]
doc = QTextDocument[]
location = "c://apython//Jim//html//notes.html"
html = open[location].read[]
doc.setHtml[html]
printer = QPrinter[]
printer.setOutputFileName["foo.pdf"]
printer.setOutputFormat[QPrinter.PdfFormat]
printer.setPageSize[QPrinter.A4];
printer.setPageMargins [15,15,15,15,QPrinter.Millimeter];
doc.print_[printer]
print "done!"
Đã trả lời ngày 20 tháng 1 năm 2015 lúc 20:38Jan 20, 2015 at 20:38
Jim Pauljim PaulJim Paul
1791 Huy hiệu bạc4 Huy hiệu đồng1 silver badge4 bronze badges
Tôi đã thử câu trả lời @northcat bằng pdfkit.
Nó yêu cầu wkhtmltopdf được cài đặt. Việc cài đặt có thể được tải xuống từ đây. //wkhtmltopdf.org/doads.html
Cài đặt tệp thực thi. Sau đó viết một dòng để cho biết wkhtmltopdf ở đâu, như bên dưới. .
import pdfkit
path_wkthmltopdf = "C:\\Folder\\where\\wkhtmltopdf.exe"
config = pdfkit.configuration[wkhtmltopdf = path_wkthmltopdf]
pdfkit.from_url["//google.com", "out.pdf", configuration=config]
Đã trả lời ngày 18 tháng 10 năm 2019 lúc 2:09Oct 18, 2019 at 2:09
Mark Kmark kMark K
7.96313 Huy hiệu vàng52 Huy hiệu bạc101 Huy hiệu đồng13 gold badges52 silver badges101 bronze badges
1
Dưới đây là một giải pháp đơn giản sử dụng Qt. Tôi thấy đây là một phần của câu trả lời cho một câu hỏi khác về Stackoverflow. Tôi đã kiểm tra nó trên Windows.
import sys
from PyQt5 import QtWidgets, QtWebEngineWidgets
from PyQt5.QtCore import QUrl
from PyQt5.QtGui import QPageLayout, QPageSize
from PyQt5.QtWidgets import QApplication
if __name__ == '__main__':
app = QtWidgets.QApplication[sys.argv]
loader = QtWebEngineWidgets.QWebEngineView[]
loader.setZoomFactor[1]
layout = QPageLayout[]
layout.setPageSize[QPageSize[QPageSize.A4Extra]]
layout.setOrientation[QPageLayout.Portrait]
loader.load[QUrl['//stackoverflow.com/questions/23359083/how-to-convert-webpage-into-pdf-by-using-python']]
loader.page[].pdfPrintingFinished.connect[lambda *args: QApplication.exit[]]
def emit_pdf[finished]:
loader.page[].printToPdf["test.pdf", pageLayout=layout]
loader.loadFinished.connect[emit_pdf]
sys.exit[app.exec_[]]
Đã trả lời ngày 20 tháng 1 năm 2015 lúc 20:38Aug 6, 2020 at 19:39
Y.khY.khY.kh
Jim Pauljim Paul2 silver badges5 bronze badges
4
1791 Huy hiệu bạc4 Huy hiệu đồng
Tôi đã thử câu trả lời @northcat bằng pdfkit.
sudo apt-get install wkhtmltopdf0
Nó yêu cầu wkhtmltopdf được cài đặt. Việc cài đặt có thể được tải xuống từ đây. //wkhtmltopdf.org/doads.htmlJul 26, 2020 at 13:31
6