Hướng dẫn render html to pdf python - kết xuất html sang pdf python

Question

Cải thiện bài viết

Nội dung chính Show

Chuyển đổi HTML/WebPage thành PDF
Tôi đã tìm giải pháp để in trang web vào tệp PDF địa phương, sử dụng Python. Một trong những giải pháp tốt là sử dụng Qt, được tìm thấy ở đây, https://bharatikunal.wordpress.com/2010/01/.
Nó không hoạt động ngay từ đầu vì tôi gặp vấn đề với việc cài đặt PYQT4 vì nó đã đưa ra các thông báo lỗi như 'import time from pyPdf import PdfFileWriter, PdfFileReader import StringIO from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter from xhtml2pdf import pisa import sys from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * url = 'http://www.yahoo.com' tem_pdf = "c:\\tem_pdf.pdf" final_file = "c:\\younameit.pdf" app = QApplication(sys.argv) web = QWebView() #Read the URL given web.load(QUrl(url)) printer = QPrinter() #setting format printer.setPageSize(QPrinter.A4) printer.setOrientation(QPrinter.Landscape) printer.setOutputFormat(QPrinter.PdfFormat) #export file as c:\tem_pdf.pdf printer.setOutputFileName(tem_pdf) def convertIt(): web.print_(printer) QApplication.exit() QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt) app.exec_() sys.exit # Below is to add on the weblink as text and present date&time on PDF generated outputPDF = PdfFileWriter() packet = StringIO.StringIO() # create a new PDF with Reportlab can = canvas.Canvas(packet, pagesize=letter) can.setFont("Helvetica", 9) # Writting the new line oknow = time.strftime("%a, %d %b %Y %H:%M") can.drawString(5, 2, url) can.drawString(605, 2, oknow) can.save() #move to the beginning of the StringIO buffer packet.seek(0) new_pdf = PdfFileReader(packet) # read your existing PDF existing_pdf = PdfFileReader(file(tem_pdf, "rb")) pages = existing_pdf.getNumPages() output = PdfFileWriter() # add the "watermark" (which is the new pdf) on the existing page for x in range(0,pages): page = existing_pdf.getPage(x) page.mergePage(new_pdf.getPage(0)) output.addPage(page) # finally, write "output" to a real file outputStream = file(final_file, "wb") output.write(outputStream) outputStream.close() print final_file, 'is ready.' 1' và 'import time from pyPdf import PdfFileWriter, PdfFileReader import StringIO from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter from xhtml2pdf import pisa import sys from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * url = 'http://www.yahoo.com' tem_pdf = "c:\\tem_pdf.pdf" final_file = "c:\\younameit.pdf" app = QApplication(sys.argv) web = QWebView() #Read the URL given web.load(QUrl(url)) printer = QPrinter() #setting format printer.setPageSize(QPrinter.A4) printer.setOrientation(QPrinter.Landscape) printer.setOutputFormat(QPrinter.PdfFormat) #export file as c:\tem_pdf.pdf printer.setOutputFileName(tem_pdf) def convertIt(): web.print_(printer) QApplication.exit() QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt) app.exec_() sys.exit # Below is to add on the weblink as text and present date&time on PDF generated outputPDF = PdfFileWriter() packet = StringIO.StringIO() # create a new PDF with Reportlab can = canvas.Canvas(packet, pagesize=letter) can.setFont("Helvetica", 9) # Writting the new line oknow = time.strftime("%a, %d %b %Y %H:%M") can.drawString(5, 2, url) can.drawString(605, 2, oknow) can.save() #move to the beginning of the StringIO buffer packet.seek(0) new_pdf = PdfFileReader(packet) # read your existing PDF existing_pdf = PdfFileReader(file(tem_pdf, "rb")) pages = existing_pdf.getNumPages() output = PdfFileWriter() # add the "watermark" (which is the new pdf) on the existing page for x in range(0,pages): page = existing_pdf.getPage(x) page.mergePage(new_pdf.getPage(0)) output.addPage(page) # finally, write "output" to a real file outputStream = file(final_file, "wb") output.write(outputStream) outputStream.close() print final_file, 'is ready.' 1'.
Làm cách nào để chuyển HTML thành PDF?
Làm cách nào để lưu đầu ra dưới dạng pdf trong python?
Làm cách nào để chuyển đổi mã HTML thành Python?
Làm thế nào để bạn xử lý một pdf trong python?

Lưu bài viết

Đọc

Bàn luận

Cải thiện bài viết

Lưu bài viết

Đọc

Bàn luận

Chuyển đổi HTML/WebPage thành PDF

Có nhiều trang web không cho phép tải xuống nội dung dưới dạng PDF, họ yêu cầu mua phiên bản cao cấp của họ hoặc don lồng có dịch vụ tải xuống như vậy dưới dạng PDF.

 $ pip install pdfkit

Chuyển đổi trong 3 bước từ trang web/HTML sang PDF
For Ubuntu/Debian:

 sudo apt-get install wkhtmltopdf

Bước1: Tải xuống thư viện pdfkit
(a)Download link: WKHTMLTOPDF
(b)Set: PATH variable set binary folder in Environment variables.

Hướng dẫn render html to pdf python - kết xuất html sang pdf python

Bước2: Tải xuống wkhtmltopdffor ubuntu/debian:
(i) Already Saved HTML page

Đối với Windows: (a) Liên kết tải xuống: WKHTMLTOPDF (b) Đặt: Biến đường dẫn Đặt thư mục nhị phân trong các biến môi trường.

 sudo apt-get install wkhtmltopdf

3

 sudo apt-get install wkhtmltopdf

4

 sudo apt-get install wkhtmltopdf

5

 sudo apt-get install wkhtmltopdf

6

 sudo apt-get install wkhtmltopdf

7

Bước3: Mã trong Python để tải xuống: (i) đã lưu trang HTML

 sudo apt-get install wkhtmltopdf

1

 sudo apt-get install wkhtmltopdf

2

Đối với Windows: (a) Liên kết tải xuống: WKHTMLTOPDF (b) Đặt: Biến đường dẫn Đặt thư mục nhị phân trong các biến môi trường.

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

0

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

1

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

2

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

3

 sudo apt-get install wkhtmltopdf

7

Bước3: Mã trong Python để tải xuống: (i) đã lưu trang HTML: Your pdf file would be created and saved in the same directory where python file exists.

 sudo apt-get install wkhtmltopdf

1

 sudo apt-get install wkhtmltopdf

2
1. You can pass a list with multiple URLs or files:

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

5

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

6

 sudo apt-get install wkhtmltopdf

5

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

8

 sudo apt-get install wkhtmltopdf

5

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

0

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

1

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

2

 sudo apt-get install wkhtmltopdf

7

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

4

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

5

 sudo apt-get install wkhtmltopdf

5

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

7

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

1

 sudo apt-get install wkhtmltopdf

6

 sudo apt-get install wkhtmltopdf

7

(ii) Chuyển đổi bằng URL trang web

Ví dụ 1: Python 3.6+.

Đầu ra: γEEKS cho γeek. ....

Ví dụ 2: Python 2.6-3.3. Chúng ta có thể sử dụng htmlparser. ....

Đầu ra: γeek cho γeek ..

Làm thế nào để bạn xử lý một pdf trong python?

0m3r

2- Thư viện Python để xử lý PDF.15 gold badges31 silver badges68 bronze badges

PDFMiner. PDFMiner là một công cụ để trích xuất thông tin từ các tài liệu PDF. ....Apr 29, 2014 at 8:10

1

PYPDF2. PYPDF2 là một thư viện PDF Python thuần túy có khả năng chia tách, hợp nhất với nhau, cắt xén và chuyển đổi các trang của các tệp PDF. ....

Tôi đã tìm giải pháp để in trang web vào tệp PDF địa phương, sử dụng Python. Một trong những giải pháp tốt là sử dụng Qt, được tìm thấy ở đây, https://bharatikunal.wordpress.com/2010/01/.

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

Nó không hoạt động ngay từ đầu vì tôi gặp vấn đề với việc cài đặt PYQT4 vì nó đã đưa ra các thông báo lỗi như 'import time from pyPdf import PdfFileWriter, PdfFileReader import StringIO from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter from xhtml2pdf import pisa import sys from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * url = 'http://www.yahoo.com' tem_pdf = "c:\\tem_pdf.pdf" final_file = "c:\\younameit.pdf" app = QApplication(sys.argv) web = QWebView() #Read the URL given web.load(QUrl(url)) printer = QPrinter() #setting format printer.setPageSize(QPrinter.A4) printer.setOrientation(QPrinter.Landscape) printer.setOutputFormat(QPrinter.PdfFormat) #export file as c:\tem_pdf.pdf printer.setOutputFileName(tem_pdf) def convertIt(): web.print_(printer) QApplication.exit() QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt) app.exec_() sys.exit # Below is to add on the weblink as text and present date&time on PDF generated outputPDF = PdfFileWriter() packet = StringIO.StringIO() # create a new PDF with Reportlab can = canvas.Canvas(packet, pagesize=letter) can.setFont("Helvetica", 9) # Writting the new line oknow = time.strftime("%a, %d %b %Y %H:%M") can.drawString(5, 2, url) can.drawString(605, 2, oknow) can.save() #move to the beginning of the StringIO buffer packet.seek(0) new_pdf = PdfFileReader(packet) # read your existing PDF existing_pdf = PdfFileReader(file(tem_pdf, "rb")) pages = existing_pdf.getNumPages() output = PdfFileWriter() # add the "watermark" (which is the new pdf) on the existing page for x in range(0,pages): page = existing_pdf.getPage(x) page.mergePage(new_pdf.getPage(0)) output.addPage(page) # finally, write "output" to a real file outputStream = file(final_file, "wb") output.write(outputStream) outputStream.close() print final_file, 'is ready.' 1' và 'import time from pyPdf import PdfFileWriter, PdfFileReader import StringIO from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import letter from xhtml2pdf import pisa import sys from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * url = 'http://www.yahoo.com' tem_pdf = "c:\\tem_pdf.pdf" final_file = "c:\\younameit.pdf" app = QApplication(sys.argv) web = QWebView() #Read the URL given web.load(QUrl(url)) printer = QPrinter() #setting format printer.setPageSize(QPrinter.A4) printer.setOrientation(QPrinter.Landscape) printer.setOutputFormat(QPrinter.PdfFormat) #export file as c:\tem_pdf.pdf printer.setOutputFileName(tem_pdf) def convertIt(): web.print_(printer) QApplication.exit() QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt) app.exec_() sys.exit # Below is to add on the weblink as text and present date&time on PDF generated outputPDF = PdfFileWriter() packet = StringIO.StringIO() # create a new PDF with Reportlab can = canvas.Canvas(packet, pagesize=letter) can.setFont("Helvetica", 9) # Writting the new line oknow = time.strftime("%a, %d %b %Y %H:%M") can.drawString(5, 2, url) can.drawString(605, 2, oknow) can.save() #move to the beginning of the StringIO buffer packet.seek(0) new_pdf = PdfFileReader(packet) # read your existing PDF existing_pdf = PdfFileReader(file(tem_pdf, "rb")) pages = existing_pdf.getNumPages() output = PdfFileWriter() # add the "watermark" (which is the new pdf) on the existing page for x in range(0,pages): page = existing_pdf.getPage(x) page.mergePage(new_pdf.getPage(0)) output.addPage(page) # finally, write "output" to a real file outputStream = file(final_file, "wb") output.write(outputStream) outputStream.close() print final_file, 'is ready.' 1'.

Đó là bởi vì PYQT4 không được cài đặt đúng cách. Tôi đã từng có các thư viện được đặt tại C: \ python27 \ lib tuy nhiên nó không dành cho pyqt4.

Trên thực tế, nó chỉ cần tải xuống từ http://www.riverbankcomputing.com/software/pyqt/doad (tâm trí phiên bản Python chính xác bạn đang sử dụng) và cài đặt nó vào C: \ Python27 (trường hợp của tôi). Đó là nó.

Bây giờ các tập lệnh chạy tốt nên tôi muốn chia sẻ nó. Để biết thêm tùy chọn trong việc sử dụng QPrinter, vui lòng tham khảo http://qt-project.org/doc/qt-4.8/qprinter.html#orientation-enum.

11.8K15 Huy hiệu vàng31 Huy hiệu bạc68 Huy hiệu đồng

hỏi ngày 29 tháng 4 năm 2014 lúc 8:10May 20, 2014 at 13:24

Bạn cũng có thể sử dụng pdfkit:NorthCat

Cách sử dụng16 gold badges45 silver badges49 bronze badges

17

Cài đặt

pip install weasyprint  # No longer supports Python 2.x.

python
>>> import weasyprint
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> open('google.pdf', 'wb').write(pdf)

MacOS:

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

3

Debian/Ubuntu:

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

42 gold badges17 silver badges29 bronze badges

Windows:

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

5Dec 23, 2015 at 15:04

Xem tài liệu chính thức cho macOS/ubuntu/os khác: https://github.com/jazzcore/python-pdfkit/wiki/installing-wkhtmltopdfJohnMudd

Đã trả lời ngày 20 tháng 5 năm 2014 lúc 13:242 gold badges26 silver badges24 bronze badges

10

Northcatnorthcat

9.23716 Huy hiệu vàng45 Huy hiệu bạc49 Huy hiệu đồng

https://github.com/disflux/django-mtr/blob/master/pdfgen/doc_overlay.py

Dấu ấn

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

Sunit GautamApr 30, 2014 at 7:31

Mark Kmark kMark K

7.96313 Huy hiệu vàng52 Huy hiệu bạc101 Huy hiệu đồng13 gold badges52 silver badges101 bronze badges

8

Theo câu trả lời này: Cách chuyển đổi trang web thành PDF bằng cách sử dụng Python, lời khuyên là sử dụng pdfkit. Bạn cũng phải cài đặt wkhtmltopdf.pdfkit. You also have to install wkhtmltopdf.

Nếu bạn có tệp

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

6 cục bộ, sau đó bạn cần sử dụng lệnh này:

pdfkit.from_file('test.html', 'out.pdf')

Nhưng điều này sẽ gây ra lỗi nếu bạn chưa thêm các tệp thực thi WKHTMLTOPDF vào đường dẫn hệ thống của mình. Đây là phần đã vấp ngã tôi và tôi muốn chia sẻ.

Trên Windows, hãy mở các biến môi trường của bạn và thêm chúng vào

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

7>

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\\tem_pdf.pdf"
final_file = "c:\\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

8 của bạn như bên dưới. Trong trường hợp của tôi, các tệp này được đặt ở đây sau khi tôi cài đặt wkhtmltopdf từ một exe:

pdfkit.from_file('test.html', 'out.pdf')

0

Đã trả lời ngày 29 tháng 1 năm 2018 lúc 22:31Jan 29, 2018 at 22:31

JaradjaradJarad

15.9K19 Huy hiệu vàng90 Huy hiệu bạc144 Huy hiệu đồng19 gold badges90 silver badges144 bronze badges

1

Đây là một trong những hoạt động tốt:

import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

app = QApplication(sys.argv)
web = QWebView()
web.load(QUrl("http://www.yahoo.com"))
printer = QPrinter()
printer.setPageSize(QPrinter.A4)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("fileOK.pdf")

def convertIt():
    web.print_(printer)
    print("Pdf generated")
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
sys.exit(app.exec_())

Fractalspace

5.3132 Huy hiệu vàng42 Huy hiệu bạc47 Huy hiệu đồng2 gold badges42 silver badges47 bronze badges

Đã trả lời ngày 29 tháng 4 năm 2014 lúc 8:11Apr 29, 2014 at 8:11

Mark Kmark kMark K

7.96313 Huy hiệu vàng52 Huy hiệu bạc101 Huy hiệu đồng13 gold badges52 silver badges101 bronze badges

2

Dưới đây là một giải pháp đơn giản sử dụng Qt. Tôi thấy đây là một phần của câu trả lời cho một câu hỏi khác về Stackoverflow. Tôi đã kiểm tra nó trên Windows.

from PyQt4.QtGui import QTextDocument, QPrinter, QApplication

import sys
app = QApplication(sys.argv)

doc = QTextDocument()
location = "c://apython//Jim//html//notes.html"
html = open(location).read()
doc.setHtml(html)

printer = QPrinter()
printer.setOutputFileName("foo.pdf")
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setPageSize(QPrinter.A4);
printer.setPageMargins (15,15,15,15,QPrinter.Millimeter);

doc.print_(printer)
print "done!"

Đã trả lời ngày 20 tháng 1 năm 2015 lúc 20:38Jan 20, 2015 at 20:38

Jim Pauljim PaulJim Paul

1791 Huy hiệu bạc4 Huy hiệu đồng1 silver badge4 bronze badges

Tôi đã thử câu trả lời @northcat bằng pdfkit.

Nó yêu cầu wkhtmltopdf được cài đặt. Việc cài đặt có thể được tải xuống từ đây. https://wkhtmltopdf.org/doads.html

Cài đặt tệp thực thi. Sau đó viết một dòng để cho biết wkhtmltopdf ở đâu, như bên dưới. .

import pdfkit


path_wkthmltopdf = "C:\\Folder\\where\\wkhtmltopdf.exe"
config = pdfkit.configuration(wkhtmltopdf = path_wkthmltopdf)

pdfkit.from_url("http://google.com", "out.pdf", configuration=config)

Đã trả lời ngày 18 tháng 10 năm 2019 lúc 2:09Oct 18, 2019 at 2:09

Mark Kmark kMark K

7.96313 Huy hiệu vàng52 Huy hiệu bạc101 Huy hiệu đồng13 gold badges52 silver badges101 bronze badges

1

Dưới đây là một giải pháp đơn giản sử dụng Qt. Tôi thấy đây là một phần của câu trả lời cho một câu hỏi khác về Stackoverflow. Tôi đã kiểm tra nó trên Windows.

import sys
from PyQt5 import QtWidgets, QtWebEngineWidgets
from PyQt5.QtCore import QUrl
from PyQt5.QtGui import QPageLayout, QPageSize
from PyQt5.QtWidgets import QApplication

if __name__ == '__main__':
    app = QtWidgets.QApplication(sys.argv)
    loader = QtWebEngineWidgets.QWebEngineView()
    loader.setZoomFactor(1)
    layout = QPageLayout()
    layout.setPageSize(QPageSize(QPageSize.A4Extra))
    layout.setOrientation(QPageLayout.Portrait)
    loader.load(QUrl('https://stackoverflow.com/questions/23359083/how-to-convert-webpage-into-pdf-by-using-python'))
    loader.page().pdfPrintingFinished.connect(lambda *args: QApplication.exit())

    def emit_pdf(finished):
        loader.page().printToPdf("test.pdf", pageLayout=layout)

    loader.loadFinished.connect(emit_pdf)
    sys.exit(app.exec_())

Đã trả lời ngày 20 tháng 1 năm 2015 lúc 20:38Aug 6, 2020 at 19:39

Y.khY.khY.kh

Jim Pauljim Paul2 silver badges5 bronze badges

4

1791 Huy hiệu bạc4 Huy hiệu đồng

Tôi đã thử câu trả lời @northcat bằng pdfkit.

 sudo apt-get install wkhtmltopdf

0

Nó yêu cầu wkhtmltopdf được cài đặt. Việc cài đặt có thể được tải xuống từ đây. https://wkhtmltopdf.org/doads.htmlJul 26, 2020 at 13:31

6

Làm cách nào để chuyển HTML thành PDF?

Để hiển thị trang web HTML thành tài liệu PDF, hãy làm theo các bước bên dưới:..

Chỉ định đường dẫn tệp PDF để hiển thị trang web HTML ..

Chỉ định nguồn HTML (URI) ..

Xác định cài đặt tài liệu PDF bằng lớp PDFSinstall ..

Chuyển đổi trang web HTML thành tệp PDF bằng phương thức RendertOpDF của lớp gchtmlRenderer ..

Làm cách nào để lưu đầu ra dưới dạng pdf trong python?

Approach:..

Nhập lớp fpdf từ mô -đun fpdf ..

Thêm một trang ..

Đặt phông chữ ..

Chèn một ô và cung cấp văn bản ..

Lưu PDF với trên mạng.Tiện ích mở rộng PDF ..

Làm cách nào để chuyển đổi mã HTML thành Python?

Điều kiện tiên quyết: Mô -đun HTML.Đưa ra một chuỗi có các ký tự HTML, tác vụ là chuyển đổi các ký tự HTML thành một chuỗi.Điều này có thể đạt được với sự trợ giúp của HTML.....

Cú pháp: html.unescape (chuỗi).

Ví dụ 1: Python 3.6+.

Đầu ra: γEEKS cho γeek.....

Ví dụ 2: Python 2.6-3.3.Chúng ta có thể sử dụng htmlparser.....

Đầu ra: γeek cho γeek ..

Làm thế nào để bạn xử lý một pdf trong python?

2- Thư viện Python để xử lý PDF..

PDFMiner.PDFMiner là một công cụ để trích xuất thông tin từ các tài liệu PDF.....

PYPDF2.PYPDF2 là một thư viện PDF Python thuần túy có khả năng chia tách, hợp nhất với nhau, cắt xén và chuyển đổi các trang của các tệp PDF.....

pdfrw..

programming python Wkhtmltopdf Xhtml2pdf Pdfkit Python Download pdf python

Hướng dẫn render html to pdf python - kết xuất html sang pdf python

Đọc

Chuyển đổi HTML/WebPage thành PDF

Tôi đã tìm giải pháp để in trang web vào tệp PDF địa phương, sử dụng Python. Một trong những giải pháp tốt là sử dụng Qt, được tìm thấy ở đây, https://bharatikunal.wordpress.com/2010/01/.

Làm cách nào để chuyển HTML thành PDF?

Làm cách nào để lưu đầu ra dưới dạng pdf trong python?

Làm cách nào để chuyển đổi mã HTML thành Python?

Làm thế nào để bạn xử lý một pdf trong python?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội