Posts

R语言实用技巧与数据分析实践

前言 R语言是专为统计分析和数据可视化而设计的编程语言,在数据科学、生物信息学、金融分析等领域有着广泛的应用。本文将分享R语言在数据分析中的实用技巧和最佳实践。向量操作处理NA值在R语言中,NA(Not Available)表示缺失值。正确处理NA值是数据清洗的重要步骤。过滤NA值如果我们有一个vector叫x,x的值中有NA,如果我们想要过滤掉x中的NA,并把过滤后的结果赋值给变量y: # 创建包含NA的向量 x <- c(1, 2, NA, 4, 5, NA, 7, 8, NA, 10) # 过滤NA值 y <- x[!is.na(x)] print(y) # [1] 1 2 4 5 7 8 10 组合条件过滤如果我们再想找出y中元素大于0的元素: y[y > 0] # [1] 1 2 4 5 7 8 10 以上两步合在一起: # 方法1:先过滤NA再过滤值 y <- x[!is.na(x)] result <- y[y > 0] # 方法2:一步完成 result <- x[!is.na(x) & x > 0] 常见错误如果我们直接使用x[x > 0]是不可行的,会得到如下的包含NA值的一个vector: x[x > 0] # [1] 1 2 NA 4 5 NA 7 8 NA 10 原因:NA与任何值的比较结果都是NA,所以NA会保留在结果中。解决方法:总是先检查NA,再做其他操作。 # 错误方式 result <- x[x > 0] # 正确方式 result <- x[!is.na(x) & x > 0] 矩阵操作创建矩阵创建一个4行5列的matrix,包含的数值是从1到20: ...

工作回顾与职业发展思考

引言 2019年年中开始,后面的工作内容应该有所调整。自从16年6月底,从上海回到合肥,加入到华米科技,到现在整整3年了。工作历程回顾第一阶段(2016年中 - 2017年初) 基本一个人在做数据分析和报表。这段时间是快速成长的阶段: 独立负责数据分析工作搭建数据报表体系熟悉业务和数据结构提升技术能力和业务理解第二阶段(2017年初 - 2018年中) 带了一个新加入的同事A一起做数据分析和ETL等相关工作。开始从个人贡献者向团队协作者转变: 学习如何带领新人分工协作,提高效率 ETL流程优化建立更完善的数据分析体系第三阶段(2018年中 - 2019年中) A去做上游的导数的事情,分析由我和新加入的B和C,两个妹子,一起来完成。同时,自己也从大数据工程师,升级成了高级大数据工程师: 团队规模扩大工作内容更加聚焦技术深度和广度都有提升开始思考职业发展方向职业转型的思考到了19年年中, 为什么想从大数据分析,转到人工智能实验室团队去做更多的AI直接相关的事情呢? 我想主要还是想去探索数据价值发挥的一个新路径吧。毕竟,描述性的统计分析,这个我已经做了三年了。而描述性数据分析的价值有它的局限性。而关于数据的更地道的挖掘和分析: 特征选取建模模型评估这些都是自己的薄弱点,也是我所认为的一个合格的data scientist必须掌握的。更何况,自己在算法和机器学习这块,并非是没有基础。人生那么长,总不能一辈子做基础的描述性的统计分析/业务分析还有做报表吧。过往的学习准备下面列出一些以前学习过的课程和材料吧,算是对过往准备工作的一个总结。理论基础台大林轩田的课程《机器学习基石》《机器学习技巧》对应的英文教材《Learning From Data》这些课程打下了坚实的机器学习理论基础,特别是对机器学习的核心概念和算法有了深入理解。吴恩达的课程《机器学习》《深度学习》完成了coursera上的深度学习的几门课程课后作业有点水,因为很多都可以通过上下文得到,但是不得不承认,是好的课后作业。其他课程周志华的西瓜书《机器学习》李航的《统计学习方法》《The Elements of Statistical Learning》(看了一点点) 数学基础概率统计的相关知识线性代数的相关知识平时都有所复习实践经验工具使用 scikit-learn:常用的机器学习算法库 pandas:数据处理和分析 numpy:数值计算项目经验 1. 逻辑回归和时间序列分析 ...

Python数据分析完整指南：NumPy、Pandas与可视化实战

Python数据分析生态系统包含了多个强大的库，它们各自承担不同的职责。本文将整合NumPy、Pandas、Matplotlib和Seaborn的核心功能，提供一份完整的数据分析实战指南。环境配置与最佳实践 Jupyter Notebook优化设置在Mac上使用Jupyter时，通过以下配置可以显著提升绘图质量。在~/.ipython/profile_default/ipython_kernel_config.py中添加： c.IPKernelApp.matplotlib = 'inline' c.InlineBackend.figure_format = 'retina' 绘图样式设置使用更美观的绘图样式： import matplotlib.pyplot as plt plt.style.use('seaborn') # 查看所有可用样式 print(plt.style.available) 推荐使用的样式包括：'seaborn'、'ggplot'、'bmh'等。核心库导入 import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import load_iris NumPy：高效数值计算基础 NumPy是Python数据分析的基石，提供了高性能的多维数组和数值计算功能。数组创建与基本操作 # 加载示例数据 iris = load_iris() df = pd.DataFrame(iris.data, columns=iris.feature_names) df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label'] # 选择特定列 data = np.array(df.iloc[:100, [0, 1, -1]]) print(f"DataFrame shape: {df.shape}") # (150, 5) print(f"Data shape: {data.shape}") # (100, 3) 数据提取技巧从ndarray提取数据和标签： # 假设最后一列是标签，前面是特征 X = dataset[:, 0:-1] # 或 dataset[:, 0:8] y = dataset[:, 8] 从DataFrame提取数据和标签： X, y = data.iloc[:, :-1], data.iloc[:, -1] 分离特征和标签： # 获取前两列特征和最后一列标签 X, y = data[:, :-1], data[:, -1] print(f"X shape: {X.shape}") # (100, 2) print(f"y shape: {y.shape}") # (100,) 数组切片的重要区别理解切片的维度差异至关重要： ...

Linux系统管理速查手册：常用命令与问题排查

在日常的系统管理工作中，我们经常需要处理各种常见的配置和监控任务。本文整理了Linux系统管理中最常用的操作命令，包括软件安装、时区配置和资源监控，帮助你快速定位和解决问题。 Node.js安装与配置 Node.js是现代Web开发中不可或缺的运行时环境。在CentOS系统上，我们可以通过NodeSource官方源快速安装最新版本。使用NodeSource安装Node.js NodeSource提供了Node.js的官方RPM包，确保我们能够获得最新的稳定版本。安装步骤 # 1. 添加NodeSource仓库（以Node.js 8.x为例） curl --silent --location https://rpm.nodesource.com/setup_8.x | sudo bash - # 2. 使用yum安装Node.js sudo yum -y install nodejs # 3. 验证安装 node -v npm -v 版本选择根据项目需求选择合适的Node.js版本： LTS版本：生产环境推荐使用，长期支持 Current版本：最新特性，适合开发测试 # Node.js 16.x LTS curl --silent --location https://rpm.nodesource.com/setup_16.x | sudo bash - # Node.js 18.x LTS curl --silent --location https://rpm.nodesource.com/setup_18.x | sudo bash - 安装后配置 # 配置npm国内镜像源（加速包下载） npm config set registry https://registry.npmmirror.com # 全局安装常用工具 npm install -g pm2 # 进程管理器 npm install -g yarn # 包管理工具 npm install -g npx # 包执行器系统时区配置正确的时区配置对于日志记录、定时任务和系统监控至关重要。Linux系统使用timedatectl命令来管理系统时区和时间设置。 ...

Spark大数据处理完全手册：从基础到进阶

Apache Spark是当前最流行的大数据处理框架之一，以其高效、易用和强大的功能著称。本文将从实践角度出发，全面介绍Spark的核心功能和使用技巧，帮助读者快速掌握大数据处理的必备技能。目录环境配置与问题排查 Spark SQL核心函数与操作文件读写与数据处理排序与分区控制 PySpark实战技巧 MLlib机器学习应用环境配置与问题排查 Mac单机PySpark环境配置在Mac上搭建本地PySpark环境时，可能会遇到主机名解析问题： Caused by: java.net.UnknownHostException: master: nodename nor servname provided, or not known 解决方案：修改$SPARK_HOME/conf/spark-env.sh配置文件： export SPARK_MASTER_IP=StevenMac export SPARK_LOCAL_IP=StevenMac 在/etc/hosts文件中添加： 127.0.0.1 StevenMac PySpark函数导入问题使用PySpark时，如果遇到无法找到col函数的问题： from pyspark.sql.functions import col # 报错：找不到col函数解决方案：安装PySpark类型存根（stubs）： pip install pyspark-stubs 这样不仅能解决导入问题，还能提供更好的IDE自动补全支持。初始化SparkSession from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('MySparkApplication') \ .config("spark.executor.memory", "3g") \ .config("spark.executor.cores", "8") \ .getOrCreate() Spark SQL核心函数与操作 Spark SQL提供了丰富的API用于结构化数据处理，以下是核心操作函数的完整指南。常用函数速查数据选择与转换： select - 选择列 lit - 创建字面量值 withColumn - 添加或替换列 withColumnRenamed - 重命名列 cast - 类型转换过滤与聚合： ...

Python开发环境配置与管理最佳实践

Python开发环境的合理配置是项目成功的基础。本文将整合虚拟环境管理、包安装优化和运行时配置三个关键主题，帮助你构建高效、规范的Python开发环境。一、使用pipenv管理虚拟环境 pipenv是Python官方推荐的包管理工具，它结合了pip和virtualenv的功能，为项目提供依赖管理和虚拟环境隔离。 1.1 导出现有环境的依赖当你在某个Python环境中已经安装了多个包，需要将其迁移到新环境时，可以使用pip freeze命令导出依赖列表： pip freeze > requirements.txt 这个命令会生成一个包含所有已安装包及其版本的requirements.txt文件，是环境迁移的第一步。 1.2 使用pipenv创建项目环境创建新项目并初始化pipenv环境的完整流程： # 创建项目目录并移动依赖文件 mkdir myproject && mv requirements.txt myproject && cd myproject # 指定Python版本创建虚拟环境 pipenv --python 3.6 # 激活虚拟环境 pipenv shell # 安装依赖（开发模式） pipenv install --dev 命令说明： pipenv --python 3.6：指定Python 3.6创建虚拟环境 pipenv shell：激活虚拟环境并进入子shell pipenv install --dev：安装requirements.txt中的所有依赖，包括开发依赖 1.3 虚拟环境管理 pipenv会在项目目录中创建Pipfile和Pipfile.lock文件，用于精确记录依赖关系。虚拟环境默认存储在~/.local/share/virtualenvs目录下。删除虚拟环境： # 方法1：使用pipenv命令（推荐） pipenv --rm # 方法2：手动删除虚拟环境目录 rm -rf ~/.local/share/virtualenvs/你的项目名称-XXXXX 1.4 pipenv最佳实践始终使用虚拟环境：避免全局污染，保持项目依赖隔离提交Pipfile和Pipfile.lock：确保团队成员使用相同的依赖版本分离开发和生产依赖：使用--dev参数区分环境定期更新依赖：使用pipenv update保持依赖最新二、使用国内镜像源加速包安装 PyPI官方服务器在国外，直接访问速度较慢。使用国内镜像源可以显著提升包安装速度。 ...

Hadoop生态系统部署实践：从数据准备到集群配置

本文将详细介绍Hadoop生态系统各组件的实际部署与配置经验，涵盖数据准备、Presto调试、HUE用户管理和Zeppelin集成等关键环节。一、气象数据准备与处理《Hadoop权威指南》是一本经典的Hadoop学习资料，书中使用了NCDC（国家气候数据中心）的气象数据作为示例。这些真实的气象数据不仅有助于理解Hadoop的工作原理，更能培养处理复杂数据的实战能力。 1.1 数据下载 NCDC提供了丰富的历史气象数据，我们可以通过脚本批量下载： #!/bin/bash # 进入目标下载目录 cdir="$(cd `dirname $0`; pwd)" # 下载1930-1960年的气象数据 # 注意：tar文件从1930年开始才有实际数据 for i in $(seq 1930 1960) do wget --execute robots=off \ --accept=tar \ -r -np -nH \ --cut-dirs=4 \ -R index.html* \ ftp://ftp.ncdc.noaa.gov/pub/data/gsod/$i/ done 1.2 数据预处理下载完成后，需要重新组织文件结构： # 将 1930/gsod_1930.tar 重命名为 1930/1930.tar # 并将所有文件集中到gsod目录 # 最终结构：gsod/1930/1930.tar, gsod/1931/1931.tar ... 1.3 HDFS数据上传在HDFS上创建目录并上传数据： # 创建HDFS目录 hdfs dfs -mkdir /GSOD /GSOD_ALL # 上传数据文件 hdfs dfs -put gsod/* /GSOD/ 常见问题处理： ...

Shell文本处理三剑客：sed、awk与grep实战指南

在Unix/Linux系统中，sed、awk和grep被称为文本处理三剑客。它们各自擅长不同的文本处理任务，配合使用可以解决绝大多数文本处理需求。本文将通过实战示例帮助你掌握这些工具的核心用法。 grep：文本搜索利器基本搜索 grep主要用于在文件中搜索匹配特定模式的行。 # 在文件中搜索单词 grep "pattern" file.txt # 递归搜索目录 grep -r "pattern" /path/to/dir # 忽略大小写 grep -i "pattern" file.txt # 显示行号 grep -n "pattern" file.txt # 反向匹配（不包含pattern的行） grep -v "pattern" file.txt # 统计匹配行数 grep -c "pattern" file.txt 多文件操作 # 查找在多个文件中都存在的行 grep -F -x -f file1 file2 file3 # 查找在file1中但不在file2中的行 grep -F -x -v -f file2 file1 # 在多个文件中搜索 grep "pattern" file1.txt file2.txt file3.txt 参数说明： -F：将模式视为固定字符串而非正则表达式 -x：整行匹配 -f：从文件读取模式 -v：反向选择实用示例 # 查找包含"error"或"warning"的行 grep -E "(error|warning)" logfile.txt # 查找以"#"开头的注释行 grep "^#" config.conf # 查找空行 grep "^$" file.txt # 查找非空行 grep -v "^$" file.txt # 查找恰好10个字符的行 grep -E "^.{10}$" file.txt # 递归查找当前目录下所有.py文件中的"TODO" grep -r "TODO" --include="*.py" . sed：流编辑器 sed是一个强大的流编辑器，擅长进行文本替换和删除操作。 ...

Shell脚本编程最佳实践

Shell脚本是系统管理和自动化任务的利器。本文将带你从基础到高级，全面掌握Shell脚本编程的最佳实践。 Shell基础第一个Shell脚本 #!/bin/bash # 这是一个注释 echo "Hello, World!" # 变量赋值和使用 name="World" echo "Hello, $name!" # 命令替换 current_date=$(date) echo "Today is: $current_date" # 反引号方式（不推荐） current_date=`date` echo "Today is: $current_date" Shebang说明： #!/bin/bash：使用bash解释器 #!/bin/sh：使用sh解释器（更通用） #!/usr/bin/env bash：自动查找bash（更便携）变量和数据类型 # 字符串变量 greeting="Hello" name="Alice" # 只读变量 readonly PI=3.14159 # 删除变量 unset name # 环境变量 export PATH=$PATH:/new/path # 字符串拼接 fullname="John $greeting" echo $fullname # 获取字符串长度 string="Hello, World" echo ${#string} # 13 # 字符串切片 echo ${string:0:5} # Hello echo ${string:7} # World # 默认值 echo ${name:-"Guest"} # 如果name未设置或为空，使用"Guest" # 数组 arr=(apple banana cherry) echo ${arr[0]} # apple echo ${arr[@]} # 所有元素 echo ${#arr[@]} # 数组长度 arr[3]="date" # 添加元素 unset arr[1] # 删除元素控制结构条件判断 # if语句 if [ "$name" == "Alice" ]; then echo "Welcome, Alice!" elif [ "$name" == "Bob" ]; then echo "Welcome, Bob!" else echo "Welcome, Guest!" fi # 数字比较 count=10 if [ $count -eq 10 ]; then echo "Count is 10" fi if [ $count -gt 5 ]; then echo "Count is greater than 5" fi if [ $count -lt 20 ]; then echo "Count is less than 20" fi # 字符串比较 if [ "$string1" == "$string2" ]; then echo "Strings are equal" fi if [ -n "$string" ]; then echo "String is not empty" fi # 文件测试 if [ -f "file.txt" ]; then echo "File exists and is a regular file" fi if [ -d "/tmp" ]; then echo "Directory exists" fi if [ -r "file.txt" ]; then echo "File is readable" fi if [ -w "file.txt" ]; then echo "File is writable" fi if [ -x "script.sh" ]; then echo "File is executable" fi # 逻辑运算 if [ $count -gt 5 ] && [ $count -lt 20 ]; then echo "Count is between 5 and 20" fi if [ $count -lt 5 ] || [ $count -gt 20 ]; then echo "Count is outside range 5-20" fi # 使用test命令 if test -f "file.txt"; then echo "File exists" fi # 双括号（更强大的算术比较） if (( count > 5 && count < 20 )); then echo "Count is between 5 and 20" fi 循环结构 # for循环 for i in 1 2 3 4 5; do echo $i done # 遍历文件 for file in *.txt; do echo "Processing: $file" done # C风格for循环 for ((i=0; i<10; i++)); do echo $i done # while循环 count=0 while [ $count -lt 5 ]; do echo $count count=$((count + 1)) done # 读取文件行 while IFS= read -r line; do echo "$line" done < file.txt # until循环 count=0 until [ $count -ge 5 ]; do echo $count count=$((count + 1)) done # break和continue for i in {1..10}; do if [ $i -eq 5 ]; then continue # 跳过5 fi if [ $i -eq 8 ]; then break # 在8处停止 fi echo $i done case语句 # 简单的case语句 read -p "Enter a color: " color case $color in red) echo "You chose red" ;; blue|green) echo "You chose blue or green" ;; *) echo "You chose something else" ;; esac # 复杂的case语句 case $1 in start) echo "Starting service..." ;; stop) echo "Stopping service..." ;; restart) echo "Restarting service..." ;; status) echo "Checking service status..." ;; *) echo "Usage: $0 {start|stop|restart|status}" exit 1 ;; esac 函数编程定义和使用函数 # 定义函数 greet() { echo "Hello, $1!" } # 调用函数 greet "Alice" # 返回值 add() { local result=$(($1 + $2)) echo $result } sum=$(add 5 3) echo "Sum: $sum" # 返回状态码 check_file() { if [ -f "$1" ]; then return 0 # 成功 else return 1 # 失败 fi } if check_file "file.txt"; then echo "File exists" else echo "File does not exist" fi # 局部变量 global_var="I am global" my_function() { local local_var="I am local" echo "Inside function: $local_var" echo "Inside function: $global_var" } my_function echo "Outside function: $global_var" # echo "Outside function: $local_var" # 错误：local_var未定义函数参数 # 处理多个参数 process_args() { echo "First argument: $1" echo "Second argument: $2" echo "All arguments: $@" echo "Number of arguments: $#" echo "Script name: $0" } process_args arg1 arg2 arg3 # 遍历所有参数 iterate_args() { for arg in "$@"; do echo "Processing: $arg" done } iterate_args file1.txt file2.txt file3.txt # shift命令 shift_test() { echo "Total arguments: $#" echo "First: $1" shift echo "After shift, first: $1" echo "Remaining arguments: $#" } shift_test a b c d 递归函数 # 阶乘（尾递归） factorial() { local n=$1 local acc=${2:-1} if [ $n -le 1 ]; then echo $acc else factorial $((n - 1)) $((acc * n)) fi } echo "Factorial of 5: $(factorial 5)" # Fibonacci fibonacci() { local n=$1 if [ $n -le 1 ]; then echo $n else echo $(( $(fibonacci $((n - 1))) + $(fibonacci $((n - 2))) )) fi } echo "Fibonacci of 10: $(fibonacci 10)" 输入输出读取用户输入 # 简单输入 read -p "Enter your name: " name echo "Hello, $name!" # 密码输入（不显示） read -s -p "Enter password: " password echo # 带超时的输入 read -t 5 -p "Enter your choice (5 seconds): " choice echo "You chose: $choice" # 读取多个值 read -p "Enter name age: " name age echo "Name: $name, Age: $age" # 从文件读取 while IFS= read -r line; do echo "Line: $line" done < input.txt # 读取确认 read -p "Continue? (y/n): " confirm if [[ $confirm == [yY] ]]; then echo "Continuing..." else echo "Aborting..." exit 1 fi 输出格式化 # echo选项 echo -n "No newline" # 不换行 echo -e "Line1\nLine2" # 解释转义字符 echo "Hello\tWorld" # 需要配合-e # printf格式化输出 printf "Name: %s, Age: %d\n" "Alice" 25 printf "Pi: %.2f\n" 3.14159 printf "%-10s %10s\n" "Left" "Right" # 重定向输出 echo "Error message" >&2 # 输出到stderr echo "Log message" >> logfile # 追加到文件 # 管道 echo "Hello World" | tr '[:upper:]' '[:lower:]' # Here文档 cat << EOF This is a multi-line string using Here document. EOF # Here字符串 grep "pattern" <<< "This is a string to search" 命令行参数处理位置参数 #!/bin/bash # script.sh echo "Script name: $0" echo "First argument: $1" echo "Second argument: $2" echo "All arguments: $@" echo "Number of arguments: $#" # 检查参数数量 if [ $# -lt 2 ]; then echo "Usage: $0 <arg1> <arg2>" exit 1 fi 使用getopts #!/bin/bash # 使用getopts处理选项 usage() { echo "Usage: $0 [-a] [-b VALUE] [-c] filename" exit 1 } while getopts ":ab:c" opt; do case $opt in a) echo "Option -a triggered" ;; b) echo "Option -b triggered with value: $OPTARG" value=$OPTARG ;; c) echo "Option -c triggered" ;; \?) echo "Invalid option: -$OPTARG" usage ;; :) echo "Option -$OPTARG requires an argument" usage ;; esac done shift $((OPTIND-1)) echo "Remaining arguments: $@" 使用getopt（更强大） #!/bin/bash # 使用getopt处理长选项 TEMP=$(getopt -o ab:c:: --long alpha,bravo:,charlie:: -n 'example.sh' -- "$@") if [ $? != 0 ]; then echo "Terminating..." >&2 exit 1 fi eval set -- "$TEMP" while true; do case "$1" in -a|--alpha) echo "Option a" shift ;; -b|--bravo) echo "Option b, argument '$2'" shift 2 ;; -c|--charlie) case "$2" in "") echo "Option c, no argument" shift 2 ;; *) echo "Option c, argument '$2'" shift 2 ;; esac ;; --) shift break ;; *) echo "Internal error!" exit 1 ;; esac done echo "Remaining arguments:" for arg in "$@"; do echo " --> '$arg'" done 信号处理捕获中断 #!/bin/bash # 捕获Ctrl+C cleanup() { echo "Cleaning up..." # 删除临时文件等 rm -f /tmp/my_script_temp* exit 1 } trap cleanup SIGINT SIGTERM echo "Press Ctrl+C to interrupt..." for i in {1..100}; do echo "Working... $i" sleep 1 done 捕获EXIT信号 #!/bin/bash # 确保清理代码总是执行 cleanup() { echo "Script is exiting..." rm -f /tmp/tempfile } trap cleanup EXIT # 创建临时文件 touch /tmp/tempfile echo "Doing some work..." # 即使脚本出错，cleanup也会执行文本处理文件操作 # 读取文件 while IFS= read -r line; do echo "$line" done < file.txt # 写入文件 echo "Hello" > output.txt echo "World" >> output.txt # 检查文件是否存在 if [ -f "file.txt" ]; then echo "File exists" fi # 检查文件是否可读 if [ -r "file.txt" ]; then echo "File is readable" fi # 获取文件大小 size=$(wc -c < file.txt) echo "File size: $size bytes" # 获取行数 lines=$(wc -l < file.txt) echo "File lines: $lines" 文本转换 # 转换为大写 echo "hello" | tr '[:lower:]' '[:upper:]' # 删除重复行 sort file.txt | uniq # 只显示重复行 sort file.txt | uniq -d # 统计重复次数 sort file.txt | uniq -c # 替换文本 sed 's/old/new/g' file.txt # 删除空行 sed '/^$/d' file.txt # 提取特定列 awk '{print $1, $3}' file.txt # 按模式分割文件 awk '/pattern/{filename="part_"++count".txt"; print > filename}' 进程管理后台执行 # 后台运行 command & # 后台运行并重定向输出 command > /dev/null 2>&1 & # 使用nohup（退出终端后继续运行） nohup command & # 查看后台任务 jobs # 带回后台任务 fg %1 # 继续后台任务 bg %1 # 杀死后台任务 kill %1 进程监控 # 查看进程 ps aux # 查找特定进程 ps aux | grep nginx # 实时监控 top # 杀死进程 kill PID kill -9 PID # 强制杀死 # 等待进程完成 wait PID 调试技巧调试模式 #!/bin/bash # 启用调试模式 set -x # 在执行前打印命令 set -v # 打印输入行 # 或者 bash -x script.sh # 只调试部分代码 set -x # 开始调试 # 需要调试的代码 set +x # 结束调试错误处理 #!/bin/bash # 遇到错误立即退出 set -e # 使用未定义变量时报错 set -u # 管道命令失败时退出 set -o pipefail # 组合使用 set -euo pipefail # 捕获错误 trap 'echo "Error on line $LINENO"; exit 1' ERR 日志记录 #!/bin/bash # 日志函数 log() { local level=$1 shift echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $@" | tee -a script.log } log INFO "Script started" log ERROR "An error occurred" log WARNING "This is a warning" 实用示例系统监控脚本 #!/bin/bash # 系统监控脚本 while true; do clear echo "=== System Monitor ===" echo "Time: $(date)" echo # CPU使用率 echo "CPU Usage:" top -bn1 | grep "Cpu(s)" | sed "s/.*, *$[0-9.]*$%* id.*/\1/" | awk '{print 100 - $1"%"}' # 内存使用 echo -e "\nMemory Usage:" free -h # 磁盘使用 echo -e "\nDisk Usage:" df -h sleep 5 done 日志分析脚本 #!/bin/bash # Apache日志分析 log_file="/var/log/apache2/access.log" echo "=== Top 10 IPs ===" awk '{print $1}' "$log_file" | sort | uniq -c | sort -rn | head echo -e "\n=== Top 10 URLs ===" awk '{print $7}' "$log_file" | sort | uniq -c | sort -rn | head echo -e "\n=== HTTP Status Codes ===" awk '{print $9}' "$log_file" | sort | uniq -c | sort -rn 自动备份脚本 #!/bin/bash # 自动备份脚本 SOURCE_DIR="/path/to/source" BACKUP_DIR="/path/to/backup" DATE=$(date +%Y%m%d_%H%M%S) BACKUP_NAME="backup_$DATE.tar.gz" # 创建备份 echo "Creating backup..." tar -czf "$BACKUP_DIR/$BACKUP_NAME" "$SOURCE_DIR" # 删除30天前的备份 find "$BACKUP_DIR" -name "backup_*.tar.gz" -mtime +30 -delete echo "Backup completed: $BACKUP_NAME" 最佳实践代码风格使用Shebang：始终在脚本开头指定解释器添加注释：解释复杂逻辑和重要步骤使用有意义的变量名：避免单字母变量（除循环变量外）缩进代码：使用一致的缩进（通常是4个空格）引用变量：始终使用引号包裹变量（"$var"而非$var）安全建议验证输入：始终验证用户输入和参数使用绝对路径：避免路径混淆最小权限原则：只授予必要的权限清理临时文件：脚本结束时清理避免eval：除非绝对必要，否则不使用eval 性能优化避免外部命令：尽量使用内置功能减少子shell：避免不必要的进程创建使用管道：而不是临时文件批量处理：一次处理多个项目缓存结果：避免重复计算小结 Shell脚本是系统管理和自动化的强大工具。通过本文，你学习了： ...

开发工具与编程技巧集锦

优秀的开发者不仅要掌握语言本身，更要熟悉各种开发工具和编程技巧。本文汇集了多种语言的实用技巧和工具，帮助你提升开发效率和代码质量。 JavaScript实用技巧数组操作基本排序 // 数字数组排序（从小到大） function compare(num1, num2) { return num1 - num2; } var nums = [3, 1, 2, 100, 4, 200]; nums.sort(compare); console.log(nums); // [1, 2, 3, 4, 100, 200] // 从大到小排序 function compareDesc(num1, num2) { return num2 - num1; } nums.sort(compareDesc); console.log(nums); // [200, 100, 4, 3, 2, 1] 注意：JavaScript的sort()方法默认将元素转换为字符串排序，所以对数字需要自定义比较函数。迭代器方法 // map：创建新数组 function first(word) { return word[0]; } var words = ["for", "your", "info"]; var acronym = words.map(first); console.log(acronym.join("")); // "fyi" // 数值计算 var numbers = [1, 2, 3, 4, 5]; var doubled = numbers.map(x => x * 2); console.log(doubled); // [2, 4, 6, 8, 10] filter过滤 // 筛选及格成绩 function passing(num) { return num >= 60; } var grades = []; for (var i = 0; i < 20; i++) { grades[i] = Math.floor(Math.random() * 101); } var passGrades = grades.filter(passing); console.log("全部成绩:", grades); console.log("及格成绩:", passGrades); // 筛选偶数 var nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; var evens = nums.filter(n => n % 2 === 0); console.log(evens); // [2, 4, 6, 8, 10] reduce累加 // 数组求和 var numbers = [1, 2, 3, 4, 5]; var sum = numbers.reduce((total, num) => total + num, 0); console.log(sum); // 15 // 数组最大值 var max = numbers.reduce((a, b) => Math.max(a, b)); console.log(max); // 5 // 统计字符出现次数 var str = "hello world"; var charCount = str.split('').reduce((count, char) => { count[char] = (count[char] || 0) + 1; return count; }, {}); console.log(charCount); // {h: 1, e: 1, l: 3, o: 2, ' ': 1, w: 1, r: 1, d: 1} some和every // some：是否存在满足条件的元素 var numbers = [1, 2, 3, 4, 5]; var hasEven = numbers.some(n => n % 2 === 0); console.log(hasEven); // true // every：是否所有元素都满足条件 var allPositive = numbers.every(n => n > 0); console.log(allPositive); // true var allGreaterThanThree = numbers.every(n => n > 3); console.log(allGreaterThanThree); // false forEach遍历 var colors = ["red", "green", "blue"]; colors.forEach((color, index) => { console.log(`${index}: ${color}`); }); // 输出： // 0: red // 1: green // 2: blue 二维数组操作计算学生平均分 // 计算每个学生的平均分 var grades = [ [89, 77, 78], [76, 82, 81], [91, 94, 89] ]; var total = 0; var average = 0.0; for (var row = 0; row < grades.length; row++) { for (var col = 0; col < grades[row].length; col++) { total += grades[row][col]; } average = total / grades[row].length; console.log("Student " + (row + 1) + " average: " + average.toFixed(2)); total = 0; average = 0.0; } // 输出： // Student 1 average: 81.33 // Student 2 average: 79.67 // Student 3 average: 91.33 计算科目平均分 // 计算每门考试的平均分 var grades = [ [89, 77, 78], [76, 82, 81], [91, 94, 89] ]; var total = 0; var average = 0.0; for (var col = 0; col < grades[0].length; col++) { for (var row = 0; row < grades.length; row++) { total += grades[row][col]; } average = total / grades.length; console.log("Test " + (col + 1) + " average: " + average.toFixed(2)); total = 0; average = 0.0; } // 输出： // Test 1 average: 85.33 // Test 2 average: 84.33 // Test 3 average: 82.67 对象操作按键排序对象 // 按对象的某个键排序 var obj = [ {name: "Alice", age: 25}, {name: "Bob", age: 20}, {name: "Charlie", age: 30} ]; obj.sort((a, b) => a.age - b.age); console.log(obj); // [ // {name: "Bob", age: 20}, // {name: "Alice", age: 25}, // {name: "Charlie", age: 30} // ] 对象解构 // 解构赋值 var person = {name: "Alice", age: 25, city: "New York"}; var {name, age} = person; console.log(name); // "Alice" console.log(age); // 25 // 嵌套解构 var data = { user: { name: "Bob", address: { city: "Boston" } } }; var {user: {address: {city}}} = data; console.log(city); // "Boston" jQuery实用技巧 jQuery是与否 // 检查jQuery是否加载 if (typeof jQuery === 'undefined') { console.log('jQuery not loaded'); } else { console.log('jQuery loaded'); } // 检查元素是否存在 if ($('#myElement').length) { console.log('Element exists'); } jQuery引入方式  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>  <script src="/js/jquery-3.6.0.min.js"></script>   <script src="./node_modules/jquery/dist/jquery.min.js"></script>  <script> require(['jquery'], function($) { $(document).ready(function() { console.log('jQuery loaded via RequireJS'); }); }); </script> Python数据结构与算法链表实现基础链表 class Node: def __init__(self, value): self.value = value self.next = None def __str__(self): return str(self.value) class LinkedList: def __init__(self): self.head = None self.tail = None def addNode(self, value): node = Node(value) if self.head is None: self.head = node self.tail = node else: self.tail.next = node self.tail = node def __str__(self): if self.head is not None: index = self.head nodeStore = [str(index.value)] while index.next is not None: index = index.next nodeStore.append(str(index.value)) return "LinkedList [ " + "->".join(nodeStore) + " ]" return "LinkedList []" def generateLinkedList(numArray): linkedlist = LinkedList() for i in range(len(numArray)): linkedlist.addNode(numArray[i]) return linkedlist # 使用示例 list1 = generateLinkedList([2, 4, 3]) print(list1) # LinkedList [ 2->4->3 ] 链表相加 class ListsSum: def addLists(self, l1, l2): p1 = l1.head p2 = l2.head carry = 0 linkedlist_sum = LinkedList() while (p1 is not None) or (p2 is not None) or (carry != 0): dig_sum = carry if p1 is not None: dig_sum += p1.value p1 = p1.next if p2 is not None: dig_sum += p2.value p2 = p2.next linkedlist_sum.addNode(dig_sum % 10) carry = dig_sum // 10 return linkedlist_sum # 使用示例 solution = ListsSum() list1 = generateLinkedList([2, 4, 3]) # 342 list2 = generateLinkedList([5, 6, 4]) # 465 print(solution.addLists(list1, list2)) # 807: LinkedList [ 7->0->8 ] 栈的实现 class Stack: def __init__(self): self.items = [] def push(self, item): self.items.append(item) def pop(self): if not self.is_empty(): return self.items.pop() def is_empty(self): return len(self.items) == 0 def peek(self): if not self.is_empty(): return self.items[-1] def size(self): return len(self.items) # 使用示例 stack = Stack() stack.push(1) stack.push(2) stack.push(3) print(stack.pop()) # 3 print(stack.peek()) # 2 print(stack.size()) # 2 算法实践技巧 Java算法练习提示使用Scanner处理输入 Scanner scanner = new Scanner(System.in); int n = scanner.nextInt(); String str = scanner.nextLine(); 数组初始化技巧 // 动态数组 ArrayList<Integer> list = new ArrayList<>(); // 固定大小数组 int[] arr = new int[n]; // 二维数组 int[][] matrix = new int[m][n]; 常用工具方法 // 数组排序 Arrays.sort(arr); // 数组转字符串 Arrays.toString(arr); // 填充数组 Arrays.fill(arr, value); Perl技巧集锦 Perl One-Liners 文本处理 # 删除重复行 perl -ne 'print unless $a{$_}++' file.txt # 查找重复行 perl -ne 'print if $a{$_}++' file.txt # 添加行号 perl -ne 'print "$. $_"' file.txt # 反转行顺序 perl -e 'print reverse <>' file.txt # 随机排序行 perl -e 'print shuffle <>' file.txt 数值计算 # 计算列的总和 perl -lane '$sum += $F[0]; END { print $sum }' file.txt # 计算平均值 perl -lane '$sum += $F[0]; $count++; END { print $sum/$count }' file.txt # 查找最大值 perl -lane '$max = $F[0] if !defined $max || $F[0] > $max; END { print $max }' file.txt Perl高级特性静态变量 # 使用state定义静态变量（Perl 5.10+） use feature 'state'; sub counter { state $count = 0; return ++$count; } print counter(); # 1 print counter(); # 2 print counter(); # 3 # 老版本方法 sub counter_old { my $count; $count ||= 0; return ++$count; } 匿名子例程 # 创建闭包 sub create_counter { my $count = 0; return sub { return ++$count; }; } my $counter1 = create_counter(); my $counter2 = create_counter(); print $counter1->(); # 1 print $counter1->(); # 2 print $counter2->(); # 1 数据结构实现对比链表的多种实现 Perl链表实现 package LinkedList; sub new { my $class = shift; my $self = { head => undef, tail => undef, }; bless $self, $class; return $self; } sub add_node { my ($self, $value) = @_; my $node = {value => $value, next => undef}; if (!defined $self->{head}) { $self->{head} = $node; $self->{tail} = $node; } else { $self->{tail}{next} = $node; $self->{tail} = $node; } } C链表实现 typedef struct Node { int value; struct Node* next; } Node; typedef struct LinkedList { Node* head; Node* tail; } LinkedList; void addNode(LinkedList* list, int value) { Node* node = (Node*)malloc(sizeof(Node)); node->value = value; node->next = NULL; if (list->head == NULL) { list->head = node; list->tail = node; } else { list->tail->next = node; list->tail = node; } } 线性表的顺序存储指针实现（动态数组） typedef struct { int* data; int length; int capacity; } SeqList; void initList(SeqList* list, int capacity) { list->data = (int*)malloc(sizeof(int) * capacity); list->length = 0; list->capacity = capacity; } void insert(SeqList* list, int index, int value) { if (index < 0 || index > list->length) { return; // 索引越界 } if (list->length >= list->capacity) { // 扩容 int newCapacity = list->capacity * 2; int* newData = (int*)realloc(list->data, sizeof(int) * newCapacity); if (newData) { list->data = newData; list->capacity = newCapacity; } } // 移动元素 for (int i = list->length; i > index; i--) { list->data[i] = list->data[i - 1]; } list->data[index] = value; list->length++; } 引用实现（智能指针） #include <memory> #include <vector> class SmartList { private: std::shared_ptr<std::vector<int>> data; public: SmartList() : data(std::make_shared<std::vector<int>>()) {} void insert(int index, int value) { if (index >= 0 && index <= data->size()) { data->insert(data->begin() + index, value); } } int get(int index) const { if (index >= 0 && index < data->size()) { return (*data)[index]; } return -1; // 或抛出异常 } int size() const { return data->size(); } }; 二叉树遍历递归实现 class TreeNode: def __init__(self, value): self.value = value self.left = None self.right = None def preorder_traversal(node): """前序遍历：根-左-右""" if node: print(node.value) preorder_traversal(node.left) preorder_traversal(node.right) def inorder_traversal(node): """中序遍历：左-根-右""" if node: inorder_traversal(node.left) print(node.value) inorder_traversal(node.right) def postorder_traversal(node): """后序遍历：左-右-根""" if node: postorder_traversal(node.left) postorder_traversal(node.right) print(node.value) 非递归实现（使用栈） def preorder_iterative(root): """前序遍历非递归实现""" if not root: return stack = [root] while stack: node = stack.pop() print(node.value) # 先右后左，保证左子树先处理 if node.right: stack.append(node.right) if node.left: stack.append(node.left) def inorder_iterative(root): """中序遍历非递归实现""" stack = [] current = root while current or stack: # 到达最左节点 while current: stack.append(current) current = current.left current = stack.pop() print(current.value) current = current.right 实用编程技巧文件批量操作 Perl批量重命名 use strict; use warnings; use Cwd; my $target_dir = getcwd(); opendir(my $dh, $target_dir) || die "can't opendir $target_dir: $!"; my @files = grep { /\w/ && -f "$_" && !/^\./ } readdir($dh); for (@files) { my $file = $_; # 示例：[Alex_Holmes]_Hadoop_in_Practice(BookZZ.org).pdf # 转换为：Hadoop_in_Practice.pdf if (/^(?:\[[\S\s]+\])([\S\s]+)(?:$[\S\s]+$)\.pdf$/) { my $new_name = $1 . ".pdf"; rename($file, $new_name) || die("error in renaming: $!"); } } Python批量操作 import os import re def batch_rename(directory): pattern = re.compile(r'\[.*?\](.*?)$.*?$\.pdf$') for filename in os.listdir(directory): match = pattern.match(filename) if match: new_name = match.group(1) + '.pdf' old_path = os.path.join(directory, filename) new_path = os.path.join(directory, new_name) os.rename(old_path, new_path) print(f"Renamed: {filename} -> {new_name}") batch_rename('.') 文本处理技巧删除^M字符 # Vim中删除DOS换行符 :%s/^M//g # ^M输入方法：Ctrl+V，然后Enter # Perl脚本删除^M并删除注释 open($IN, $ARGV[0]) or die "in: $@"; open($OUT, ">", $ARGV[0] . ".new") or die "out: $@"; while (<$IN>) { my $line = $_; $line =~ s/(\/\/.*)//g; # 删除C风格注释 $line =~ s/\r//g; # 删除^M print $OUT $line; } close($IN); close($OUT); # 转换并替换原文件 $command = "mv $ARGV[0].new $ARGV[0] && chmod 777 $ARGV[0] && dos2unix $ARGV[0]"; system($command); 命令行工具技巧按行长度排序 # 按行长度从长到短排序 cat file.txt | awk '{ print length($0) " " $0; }' | sort -r -n | cut -d ' ' -f 2- # 按行长度从短到长排序 cat file.txt | awk '{ print length($0) " " $0; }' | sort -n | cut -d ' ' -f 2- 提取公共行 # 查找多个文件中的公共行 grep -F -x -f file1 file2 file3 # 查找在file1中但不在file2中的行 grep -F -x -v -f file2 file1 统计最常用命令 # 查看最常用的10个命令 history | awk '{a[$2]++} END {for(i in a) {print a[i]" "i}}' | sort -rn | head 模块化编程创建可重用模块 # MyUtils.pm package MyUtils; use strict; use warnings; use Exporter 'import'; our @EXPORT_OK = qw(add multiply); sub add { my ($a, $b) = @_; return $a + $b; } sub multiply { my ($a, $b) = @_; return $a * $b; } 1; # 使用模块 use MyUtils qw(add multiply); print add(2, 3); # 5 print multiply(2, 3); # 6 Python模块化 # utils.py def add(a, b): return a + b def multiply(a, b): return a * b # main.py from utils import add, multiply print(add(2, 3)) # 5 print(multiply(2, 3)) # 6 性能优化技巧尾递归优化 // 普通递归阶乘 int factorial(int n) { if (n <= 1) return 1; return n * factorial(n - 1); } // 尾递归优化版本 int factorial_tail(int n, int accumulator) { if (n <= 1) return accumulator; return factorial_tail(n - 1, n * accumulator); } int factorial(int n) { return factorial_tail(n, 1); } 记忆化技术 # Fibonacci记忆化 from functools import lru_cache @lru_cache(maxsize=None) def fibonacci(n): if n < 2: return n return fibonacci(n - 1) + fibonacci(n - 2) # 手动实现记忆化 def fibonacci_memo(): cache = {} def fib(n): if n in cache: return cache[n] if n < 2: result = n else: result = fib(n - 1) + fib(n - 2) cache[n] = result return result return fib fib = fibonacci_memo() 小结本文汇集了多种编程语言和工具的实用技巧： ...