分类: Python

  • 爬取高山网中国大学排名python代码

    #当然可以!以下是一个简单的 Python 爬虫示例,
    #用于从高山网获取近三年中国高校排名数据。你可以根据自己的需求进行修改和扩展。
    
    # python
    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    # 请求高山网的排名页面
    url = 'http://m.gaosan.com/gaokao/265440.html'
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # 提取表格数据
    table = soup.find('table')
    rows = table.find_all('tr')
    
    # 解析表格数据并存储到 DataFrame
    data = []
    for row in rows[1:]:  # 跳过表头
        cols = row.find_all('td')
        rank = cols[0].text.strip()
        school_name = cols[1].text.strip()
        score = cols[2].text.strip()
        star_rank = cols[3].text.strip()
        level = cols[4].text.strip()
        data.append([rank, school_name, score, star_rank, level])
    
    df = pd.DataFrame(data, columns=['名次', '学校名称', '综合得分', '星级排名', '办学层次'])
    
    # 将数据保存到 CSV 文件
    df.to_csv('排名.csv', index=False, encoding='utf-8')
    
    print("数据已保存到:排名.csv文件中。")
    
  • python库nicegui用法-浏览器页面交互操作

    from nicegui import ui
    
    ui.label('Hello NiceGUI!')
    #gui button "Click Me!"
    ui.button('Click me!', on_click=lambda e: ui.notify('Clicked!'))
    def greet(name):
        return f'Hello {name}'
    #gui button "Greet"
    ui.button('Greet', on_click=lambda e: ui.notify(greet('NiceGUI')))
    
    
    ui.icon('thumb_up')
    ui.markdown('This is **Markdown**.')
    ui.html('This is <strong>HTML</strong>.')
    with ui.row():
        ui.label('CSS').style('color: #888; font-weight: bold')
        ui.label('Tailwind').classes('font-serif')
        ui.label('Quasar').classes('q-ml-xl')
    ui.link('NiceGUI on GitHub', 'https://github.com/zauberzeug/nicegui')
    
    
    ui.html('<h3>---Common UI Elements----</h3>')
    
    
    from nicegui.events import ValueChangeEventArguments
    
    def show(event: ValueChangeEventArguments):
        name = type(event.sender).__name__
        ui.notify(f'{name}: {event.value}')
    
    ui.button('Button', on_click=lambda: ui.notify('Click'))
    with ui.row():
        ui.checkbox('Checkbox', on_change=show)
        ui.switch('Switch', on_change=show)
    ui.radio(['A', 'B', 'C'], value='A', on_change=show).props('inline')
    with ui.row():
        ui.input('Text input', on_change=show)
        ui.select(['One', 'Two'], value='One', on_change=show)
    ui.link('And many more...', '/documentation').classes('mt-8')
    
    
    ui.label('---Value Bindings----')
    ui.label('Binding values between UI elements and data models is built into NiceGUI.')
    class Demo:
        def __init__(self):
            self.number = 1
    
    demo = Demo()
    v = ui.checkbox('visible', value=True)
    with ui.column().bind_visibility_from(v, 'value'):
        ui.slider(min=1, max=3).bind_value(demo, 'number')
        ui.toggle({1: 'A', 2: 'B', 3: 'C'}).bind_value(demo, 'number')
        ui.number().bind_value(demo, 'number')
    
    
    ui.run()

    运行效果截图

  • python字符识别OCR Tessreact库

    Tesseract 是一款强大的开源 OCR(光学字符识别)引擎,可以用于从图像中提取文本。以下是使用 Python 和 Tesseract 进行文字识别的示例代码:

    首先,确保你已经安装了 Tesseract。你可以按照官方网站上的指南进行安装,并设置好环境变量。

    接下来,你可以使用 Python 中的 pytesseract 库来调用 Tesseract。以下是一个示例代码,它会加载一张图片,进行简单的图像处理,然后使用 Tesseract 进行文字识别:

    # -*- coding: utf-8 -*-
    """
    Created on Fri Mar 22 15:06:57 2024
    OCR字符识别 Tesseract
    @author: cnliutz
    """
    # 1.安装 Tesseract OCR 引擎
    #tesseract 安装下载地址 https://github.com/UB-Mannheim/tesseract/wiki
    #tesseract 语言包下载地址 https://github.com/tesseract-ocr/tessdata  Global Language
    #中文包 https://github.com/tesseract-ocr/tessdata/blob/main/chi_sim.traineddata
    #请把文件chi_sim.traineddata放到 C:\Program Files\Tesseract-OCR\tessdata\chi_sim.traineddata
    # 2.打开DOS命令窗口,安装需要的库
    #pip install pytesseract
    #pip install pillow
    #pip install opencv-python
    #pip install numpy
    
    
    # 导入所需的库
    from PIL import Image   #pip install pillow
    import pytesseract
    import argparse
    import cv2              #pip install  opencv-python
    import os
    
    # Set the path to the Tesseract OCR engine
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR/tesseract.exe'
    
    # 构建参数解析器并解析参数
    ap = argparse.ArgumentParser()
    ap.add_argument("-i", "--image", required=True, help="输入要进行OCR的图像路径")
    ap.add_argument("-p", "--preprocess", type=str, default="thresh", help="预处理类型PREPROCESS: 'thresh' 或者 'blur'")
    args = vars(ap.parse_args())
    
    # 加载示例图像并将其转换为灰度图像
    image = cv2.imread(args["image"])
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # 根据预处理类型对图像进行处理
    if args["preprocess"] == "thresh":
        gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    elif args["preprocess"] == "blur":
        gray = cv2.medianBlur(gray, 3)
    
    # 将灰度图像保存为临时文件,以便进行 OCR
    filename = "{}.png".format(os.getpid())
    cv2.imwrite(filename, gray)
    
    # 使用 Tesseract 进行文字识别
    text = pytesseract.image_to_string(Image.open(filename))
    os.remove(filename)  # 删除临时文件
    
    # 打印识别结果
    print(text)
    

    在上述代码中,我们首先使用 Pillow 库打开图像并将其转换为灰度图像。然后,根据预处理类型(阈值分割或模糊处理),对图像进行相应的处理。最后,我们使用 pytesseract.image_to_string() 函数对处理后的图像进行文字识别。
    你可以将上述代码保存为 ocr.py 文件,然后在命令行中运行:

    python ocr.py --image images/example.png -p blur
    

    这将对示例图像进行文字识别。请确保替换 images/example.png 为你要识别的实际图像路径。

    tesseract 电脑环境安装,处理中文 参考网址

    参考网址: https://blog.csdn.net/weixin_41013322/category_8760189.html

    #简化版OCR代码!^^!
    import pytesseract
    from PIL import Image
    
    # Set the path to the Tesseract OCR engine
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR/tesseract.exe'
    
    
    # Open an image file
    image_path = r'C:\Users\czliu\Documents\python/wz.png'
    image = Image.open(image_path)
    
    # Perform OCR on the image
    text = pytesseract.image_to_string(image,lang='chi_sim')
    
    # Print the extracted text
    print(text)
    
  • python自动定时任务schedule库的使用方法

    当你需要在 Python 中定期执行任务时,schedule 库是一个非常实用的工具。它可以帮助你自动化定时任务。以下是一些使用示例:

    1. 基本使用:import schedule
      import time

      def job():
         print(“I’m working…”)

      schedule.every(10).minutes.do(job)

      while True:
         schedule.run_pending()
         time.sleep(1)上面的代码表示每隔 10 分钟执行一次 job 函数,非常简单方便。
    2. 更多调度任务例子:import schedule
      import time

      def job():
         print(“I’m working…”)

      # 每十分钟执行任务
      schedule.every(10).minutes.do(job)
      # 每个小时执行任务
      schedule.every().hour.do(job)
      # 每天的10:30执行任务
      schedule.every().day.at(“10:30”).do(job)
      # 每个月执行任务
      schedule.every().monday.do(job)
      # 每个星期三的13:15分执行任务
      schedule.every().wednesday.at(“13:15”).do(job)
      # 每分钟的第17秒执行任务
      schedule.every().minute.at(“:17”).do(job)

      while True:
         schedule.run_pending()
         time.sleep(1)
    3. 只运行一次任务:import schedule
      import time

      def job_that_executes_once():
         # 此处编写的任务只会执行一次…
         return schedule.CancelJob

      schedule.every().day.at(’22:30′).do(job_that_executes_once)

      while True:
         schedule.run_pending()
         time.sleep(1)
    4. 参数传递给作业:import schedule

      def greet(name):
         print(‘Hello’, name)

      # 将额外的参数传递给 job 函数
      schedule.every(2).seconds.do(greet, name=’Alice’)
      schedule.every(4).seconds.do(greet, name=’Bob’)
    5. 获取目前所有的作业:all_jobs = schedule.get_jobs()
    6. 取消所有作业:schedule.clear()
    7. 标签功能:# 打标签
      schedule.every().day.do(greet, ‘Andrea’).tag(‘daily-tasks’, ‘friend’)
      schedule.every().hour.do(greet, ‘John’).tag(‘hourly-tasks’, ‘friend’)
      # 获取所有该标签的任务
      friends = schedule.get_jobs(‘friend’)
      # 取消所有 daily-tasks 标签的任务
      schedule.clear(‘daily-tasks’)
    8. 设定作业截止时间:from datetime import datetime, timedelta, time

      def job():
         print(‘Boo’)

      # 每个小时运行作业,18:30后停止
      schedule.every(1).hours.until(“18:30”).do(job)
      # 其他截止时间设置…

    [这些示例涵盖了从秒到月的不同配置,你可以根据需求选择合适的定时任务方式。

  • python pillow画图

    from PIL import Image, ImageDraw
    
    def draw_eclipse(draw, x, y, width, height, fill_color):
        draw.ellipse([x, y, x + width, y + height], fill=fill_color)
    
    def draw_rectangle(draw, x, y, width, height, fill_color):
        draw.rectangle([x, y, x + width, y + height], fill=fill_color)
    
    def draw_triangle(draw, x1, y1, x2, y2, x3, y3, fill_color):
        draw.polygon([(x1, y1), (x2, y2), (x3, y3)], fill=fill_color)
    
    filename = input("Enter the file name: ")
    extension = filename[-3:]
    name = filename[:len(filename)-4]
    
    if extension.lower() != "png":
        extension = "png"
    
    image = Image.new("RGBA", (500, 500), (0, 0, 0, 0))
    draw = ImageDraw.Draw(image)
    
    draw_eclipse(draw, 50, 50, 200, 200, (255, 0, 0))
    draw_rectangle(draw, 250, 50, 100, 100, (0, 255, 0))
    draw_triangle(draw, 50, 50, 300, 100, 150, 200, (0, 0, 255))
    
    image.save(f"{name}.{extension}")
  • python获取文件名和扩展名的方法

    1. 采用切片法提取文件名和扩展名

    创建一个变量,文件名。假设它有一个三个字母的扩展名,并使用切片操作,找到扩展名。对于 README.txt,扩展名应该是 txt。使用切片操作编写代码,将给出没有扩展名的名称。您的代码是否适用于任意长度的文件名?

    您的代码是:

    filename = “readme.txt”
    extension = filename[-3:]
    print(“The extension is”, extension)
    name = filename[:len(filename)-4]
    print(“The name without extension is”, name)

    2. os.path 和 pathlib方法提取文件名和扩展名

    在 Python 中,有不同的方法可以创建一个变量并找到文件名的扩展名。一种方法是使用 os.path 模块,它提供了一个名为 splitext() 的函数,它可以将文件路径分割为文件名和文件扩展名另一种方法是使用 pathlib 模块,它有一个 Path 类,它有一个 suffix 属性,可以用来获取文件扩展名。这里有一些使用这些方法的代码示例:

    使用 os.path.splitext():import os
    filename = “README.txt” # 创建一个变量,包含文件名
    name, extension = os.path.splitext(filename) # 分割文件名和扩展名
    print(“The extension is”, extension) # 打印扩展名
    print(“The name without extension is”, name) # 打印没有扩展名的名称

    使用 pathlib.Path().suffix:from pathlib import Path
    filename = “README.txt” # 创建一个变量,包含文件名
    file_path = Path(filename) # 创建一个 Path 对象
    extension = file_path.suffix # 获取扩展名
    print(“The extension is”, extension) # 打印扩展名
    name = file_path.stem # 获取没有扩展名的名称
    print(“The name without extension is”, name) # 打印没有扩展名的名称

    这两种方法都适用于任意长度的文件名,只要它们有一个点(.)分隔文件名和扩展名。如果文件名没有点,或者有多个点,结果可能会根据使用的方法而不同。例如,如果文件名是 “.bashrc”,os.path.splitext() 方法会返回 (“.bashrc”, “”) 作为文件名和扩展名,而 pathlib.Path().suffix 方法会返回 “” 作为扩展名。类似地,如果文件名是 “foo.bar.tar.gz”,os.path.splitext() 方法会返回 (“foo.bar.tar”, “.gz”) 作为文件名和扩展名,而 pathlib.Path().suffix 方法会返回 “.gz” 作为扩展名。如果你想获取一个有多个点的文件名中的所有扩展名,你可以使用 pathlib.Path().suffixes 属性,它会返回一个扩展名的列表。例如,pathlib.Path(“foo.bar.tar.gz”).suffixes 会返回 [“.bar”, “.tar”, “.gz”]。


    Create a variable, filename. Assuming that it has a three letter extension, and using slice operations, find the extension. For README.txt, the extension should be txt. Write code using slice operations that will give the name without the extension. Does your code work on filenames of arbitrary length?

    filename = "readme.txt"
    extension = filename[-3:]
    print(extension)
    name = filename[:len(filename)-4]
    print(name)

    There are different ways to create a variable and find the extension of a filename in Python. One way is to use the os.path module, which provides a function called splitext() that splits the file path into the file name and the file extension1. Another way is to use the pathlib module, which has a Path class that has a suffix property that can be used to get the file extension2. Here are some examples of code using these methods:

    Using os.path.splitext():

    import os
    filename = "README.txt" # create a variable with the file name
    name, extension = os.path.splitext(filename) # split the file name and extension
    print("The extension is", extension) # print the extension
    print("The name without extension is", name) # print the name without extension

    Using pathlib.Path().suffix:from pathlib

    import Path
    filename = "README.txt" # create a variable with the file name
    file_path = Path(filename) # create a Path object
    extension = file_path.suffix # get the extension
    print("The extension is", extension) # print the extension
    name = file_path.stem # get the name without extension
    print("The name without extension is", name) # print the name without extension

    Both of these methods work on filenames of arbitrary length, as long as they have a dot (.) separating the name and the extension. If the filename does not have a dot, or has more than one dot, the results may vary depending on the method used. For example, if the filename is “.bashrc”, the os.path.splitext() method will return (“.bashrc”, “”) as the name and extension, while the pathlib.Path().suffix method will return “” as the extension. Similarly, if the filename is “foo.bar.tar.gz”, the os.path.splitext() method will return (“foo.bar.tar”, “.gz”) as the name and extension, while the pathlib.Path().suffix method will return “.gz” as the extension. If you want to get all the extensions in a filename with multiple dots, you can use the pathlib.Path().suffixes property, which returns a list of extensions2. For example, pathlib.Path(“foo.bar.tar.gz”).suffixes will return [“.bar”, “.tar”, “.gz”].

    程序运行结果:

    txt readme


    The extension is .txt
    The name without extension is README


    The extension is .txt
    The name without extension is README

  • python类、函数举例

    #类创建
    class Calculator:
        def __init__(self):
            print("A calculator has been created.")
        def add(self, x, y):
            return x + y
        def subtract(self, x, y):
            return x - y
        def multiply(self, x, y):
            return x * y
    cal = Calculator()
    print(cal.add(2,3))
    print(cal.subtract(7,2))
    print(cal.multiply(3,8))
    
    #函数
    def camel_to_snake(s, separator='_'):
       result = ''
       for c in s:
           if c.isupper():
               result += separator + c.lower()
           else:
               result += c
       return result
    
    print(camel_to_snake("ThisIsCamelCased").title()) # Output: "this_is_camel_cased"
    print(camel_to_snake("ThisIsCamelCased", "-")) # Output: "this-is-camel-cased"
    
  • 调试python程序的方法

    pdb包
    Python also includes a debugger to step through code. It is found in a module named pdb. This library is modeled after the gdb library for C. To drop into the debugger at any point in a Python program, insert the code:

    import pdb; pdb.set_trace()
    These are two statements here, but I typically type them in a single line separated by a semicolon—that way I can easily remove them with a single keystroke from my editor when I am done debugging. This is also about the only place I use a semicolon in Python code (two statements in a single line).

    When this line is executed, it will present a (pdb) prompt, which is similar to the REPL. Code can be evaluated at this prompt and you can inspect objects and variables as well. Also, breakpoints can be set for further inspection.

    Below is a table listing useful pdb commands:

    Command Purpose
    h, help List the commands available
    n, next Execute the next line
    c, cont, continue Continue execution until a breakpoint is hit
    w, where, bt Print a stack trace showing where execution is
    u, up Pop up a level in the stack
    d, down Push down a level in the stack
    l, list List source code around current line

    IDLE运行如以下代码,使用命令h,n,c,w,u,d,l来调试程序

    # Fibonacci series up to n
    import pdb; pdb.set_trace()
    def fib(n):
        a, b = 0, 1
        while a < n:
            print(a, end=' ')
            a, b = b, a+b
        print()
    
    fib(1000)
    
  • python判断文件名后缀

    ```python
    xl = 'Oct2000.xls'
    xl.endswith('.xls')
    True
    xl.endswith('.xlsx')
    False