Skip to content

Add Vale #3694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Apr 25, 2025
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
5fcb8a0
Add Vale
Blargian Apr 17, 2025
295f080
Introduce a change to docs to test
Blargian Apr 17, 2025
b761e0f
edit workflow
Blargian Apr 17, 2025
e38ee13
Modify workflow to run on PR for testing
Blargian Apr 17, 2025
d60d996
Fix workflow error
Blargian Apr 17, 2025
0abfeaf
Install vale
Blargian Apr 17, 2025
b5bff81
logic for finding changed lines
Blargian Apr 17, 2025
18eeb92
Changes to changed line step
Blargian Apr 17, 2025
6136410
Debugging lines changed step
Blargian Apr 17, 2025
02ff812
Debug
Blargian Apr 17, 2025
063d2b9
Modify Run vale on changed files step
Blargian Apr 17, 2025
7bdcb32
set styles path in .vale.ini
Blargian Apr 17, 2025
505c623
Fix styles path
Blargian Apr 17, 2025
a50b566
Fix
Blargian Apr 17, 2025
0b4561a
Change to repository checkout
Blargian Apr 17, 2025
fadb84c
changes to checkout step
Blargian Apr 17, 2025
e8888d0
Fix checkout
Blargian Apr 18, 2025
2d6a425
Fix checkout step
Blargian Apr 18, 2025
69a099f
Debug why Vale is not finding our style folder
Blargian Apr 18, 2025
77d0e91
Debug why ClickHouse directory is not found within styles
Blargian Apr 18, 2025
7e05f55
more debugging
Blargian Apr 18, 2025
b9b22a6
Fix issue with styles/ClickHouse not being found
Blargian Apr 18, 2025
4ce08bd
move changed lines logic to own script
Blargian Apr 18, 2025
15c680e
Update the 'Run vale on changed files' step
Blargian Apr 18, 2025
7fbf856
Fix incorrect path name
Blargian Apr 18, 2025
2a7311a
add a test
Blargian Apr 18, 2025
7447574
update workflow
Blargian Apr 18, 2025
6c76b12
Update to use github annotations
Blargian Apr 18, 2025
719ecee
Fix error in annotation
Blargian Apr 18, 2025
f68496e
Add check-prose for running vale locally and run it with yarn check-s…
Blargian Apr 19, 2025
b881e11
Add some ClickHouse specific style rules
Blargian Apr 19, 2025
2150036
Change min alert level to error
Blargian Apr 19, 2025
3adf474
Add some more test mistakes
Blargian Apr 19, 2025
a113ca7
newline end of file
Blargian Apr 25, 2025
63a488c
newline end of file
Blargian Apr 25, 2025
bc6b26f
Fix vale suggestions
Blargian Apr 25, 2025
6a9df55
newline end of file
Blargian Apr 25, 2025
05c2902
Update vale_output.log
Blargian Apr 25, 2025
18078ab
Restore test changes
Blargian Apr 25, 2025
61afa35
Update json_type.md
Blargian Apr 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 0 additions & 19 deletions .github/workflows/trigger-build.yml

This file was deleted.

76 changes: 76 additions & 0 deletions .github/workflows/vale-linter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: Style check

on:
pull_request:
types:
- synchronize
- reopened
- opened

permissions:
contents: read

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true

jobs:
vale:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
path: .

- name: Install Vale
run: |
sudo snap install vale
vale -v # Verify installation

- name: Set up Python
run: |
curl -Ls https://astral.sh/uv/install.sh | sh
uv python install 3.12

- name: Log changed lines
run: |
# Make sure script is executable
chmod +x scripts/vale/changed_lines_to_json.py

# Run the script to get changed lines
python scripts/vale/changed_lines_to_json.py ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }}

# Check if the report was created
if [ -f "logs/changed_lines.json" ]; then
echo "Changed lines log generated successfully."
else
echo "Error: Failed to generate changed lines report."
exit 1
fi

- name: Run vale on changed files
run: |
# Extract file names from the JSON report
CHANGED_FILES=$(cat logs/changed_lines.json | jq -r '.[].filename' | tr '\n' ' ')

# Check if we have any files to process
if [ -z "$CHANGED_FILES" ]; then
echo "No changed files to analyze"
exit 0
fi

echo "Running Vale on: $CHANGED_FILES"
vale --config='.vale.ini' \
${CHANGED_FILES} \
--output=scripts/vale/vale_output_template.tmpl --no-exit > logs/vale_output.log

- name: Parse Vale output
run: |
# Read the changed_lines.json to get line numbers
CHANGED_LINES=$(cat logs/changed_lines.json)

# Run the parser script
python scripts/vale/vale_annotations.py --git-log-file="logs/changed_lines.json" --vale-log-file="logs/vale_output.log"
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -57,7 +57,7 @@ docs/settings/beta-and-experimental-features.md

**.translated
**.translate
ClickHouse/
/ClickHouse/


# Ignore table of contents files
5 changes: 5 additions & 0 deletions .vale.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
StylesPath = styles
MinAlertLevel = suggestion

[*.{md}]
BasedOnStyles = ClickHouse
2 changes: 1 addition & 1 deletion docs/best-practices/json_type.md
Original file line number Diff line number Diff line change
@@ -36,7 +36,7 @@ The JSON type enables efficient columnar storage by flattening paths into subcol
Type hits offer more than just a way to avoid unnecessary type inference - they eliminate storage and processing indirection entirely. JSON paths with type hints are always stored just like traditional columns, bypassing the need for [**discriminator columns**](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#storage-extension-for-dynamically-changing-data) or dynamic resolution during query time. This means that with well-defined type hints, nested JSON fields achieve the same performance and efficiency as if they were modeled as top-level fields from the outset. As a result, for datasets that are mostly consistent but still benefit from the flexibility of JSON, type hints provide a convenient way to preserve performance without needing to restructure your schema or ingest pipeline.
:::

## Advanced Features {#advanced-features}
## Advanced features {#advanced-features}

* JSON columns **can be used in primary keys** like any other columns. Codecs cannot be specified for a sub-column.
* They support introspection via functions like [`JSONAllPathsWithTypes()` and `JSONDynamicPaths()`](/sql-reference/data-types/newjson#introspection-functions).
5 changes: 3 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -25,10 +25,11 @@
"write-heading-ids": "docusaurus write-heading-ids",
"run-markdown-linter": "yarn check-markdown",
"run-indexer": "bash ./scripts/search/run_indexer.sh",
"check-style": "yarn check-markdown && yarn check-spelling && yarn check-kb",
"check-style": "yarn check-markdown && yarn check-spelling && yarn check-kb && yarn check-prose",
"check-spelling": "./scripts/check-doc-aspell",
"check-kb": "./scripts/check-kb.sh",
"check-markdown": "./scripts/check-markdown.sh"
"check-markdown": "./scripts/check-markdown.sh",
"check-prose": "./scripts/vale/check-prose.sh"
},
"dependencies": {
"@clickhouse/click-ui": "^0.0.199",
104 changes: 104 additions & 0 deletions scripts/vale/changed_lines_to_json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env python3
"""
Script to identify changed lines in Markdown files from a git diff.

This script finds all changed markdown files in the docs directory
and creates a JSON file showing which lines were modified.

Usage:
python scripts/changed_lines_to_json.py <base_sha> <head_sha>

Output:
Creates a JSON file at logs/changed_lines.json with format:
[
{
"filename": "docs/path/to/file.md",
"changed_lines": [11, 15, 20]
},
...
]
"""

import json
import subprocess
import sys
import re
import os
from pathlib import Path

def get_changed_files(base_sha, head_sha, pattern=r'^docs/.*\.(md|mdx)$'):
"""Get list of changed files matching the pattern."""
try:
cmd = f"git diff --name-only {base_sha} {head_sha}"
result = subprocess.check_output(cmd, shell=True, text=True)
all_files = result.splitlines()

# Filter files by pattern
changed_files = [f for f in all_files if re.match(pattern, f) and os.path.isfile(f)]

print(f"Found {len(changed_files)} changed files matching pattern")
return changed_files
except subprocess.CalledProcessError as e:
print(f"Error getting changed files: {e}")
return []

def get_changed_lines(file_path, base_sha, head_sha):
"""Get line numbers that were changed in a specific file."""
try:
cmd = f"git diff --unified=0 {base_sha} {head_sha} -- {file_path}"
diff_output = subprocess.check_output(cmd, shell=True, text=True)

changed_lines = []
for line in diff_output.splitlines():
if line.startswith("@@"):
# Extract line number from git diff header
match = re.search(r"^@@ -[0-9]+(?:,[0-9]+)? \+([0-9]+)(?:,[0-9]+)? @@", line)
if match:
line_number = int(match.group(1))
changed_lines.append(line_number)

return changed_lines
except subprocess.CalledProcessError as e:
print(f"Error getting changed lines for {file_path}: {e}")
return []

def main():
if len(sys.argv) < 3:
print("Usage: python changed_lines_to_json.py <base_sha> <head_sha>")
sys.exit(1)

base_sha = sys.argv[1]
head_sha = sys.argv[2]

# Create output directory
Path("logs").mkdir(exist_ok=True)

# Get changed files
changed_files = get_changed_files(base_sha, head_sha)

# Process each file
result = []
for file in changed_files:
print(f"Processing file: {file}")
changed_lines = get_changed_lines(file, base_sha, head_sha)

if changed_lines:
result.append({
"filename": file,
"changed_lines": changed_lines
})
print(f"Found {len(changed_lines)} changed lines in {file}")

# Write results to JSON file
output_path = "logs/changed_lines.json"
with open(output_path, "w") as f:
json.dump(result, f, indent=2)

print(f"Generated JSON log at {output_path}")

# Print the log for debugging
with open(output_path, "r") as f:
print(f.read())

if __name__ == "__main__":
main()
129 changes: 129 additions & 0 deletions scripts/vale/check-prose.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#!/bin/bash
# Script to run Vale on locally changed files or specified files
# Usage:
# 1. Run on changed files: ./run_vale_local.sh
# 2. Run on specific files: ./run_vale_local.sh -f "docs/**/*.md"
# 3. Run on list of files: ./run_vale_local.sh -f "docs/file1.md docs/file2.md"

# Get script directory and repository root
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

# Change to repository root for reliable paths
cd "$REPO_ROOT"

SCRIPT_NAME=$(basename "$0")

# Colors for ANSI output formatting
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color

# Check that Vale is installed
vale -v
if [ $? -eq 1 ]
then
echo "[$SCRIPT_NAME] Error: Vale not found. Please install vale."
exit 1;
else
echo "[$SCRIPT_NAME] Success: Found Vale."
fi

# Default values
BASE_BRANCH="main"
FILE_PATTERN=""
USE_CHANGED_FILES=true

# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
-f|--files)
USE_CHANGED_FILES=false
FILE_PATTERN="$2"
shift 2
;;
*)
echo -e "${RED}Invalid argument: $1${NC}"
echo "Usage: ./run_vale_local.sh [-f|--files \"file_pattern_or_list\"]"
exit 1
;;
esac
done

if $USE_CHANGED_FILES; then
# Get current branch name
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)

echo -e "${GREEN}Running Vale check on files changed on $CURRENT_BRANCH branch${NC}"

# Create logs directory
mkdir -p logs

# Find the merge-base (common ancestor) of main and current branch
MERGE_BASE=$(git merge-base $BASE_BRANCH $CURRENT_BRANCH)

# Get changed files between merge-base and current branch
CHANGED_FILES=$(git diff --name-only $MERGE_BASE $CURRENT_BRANCH | grep -E '^docs/.*\.(md|mdx)$' | tr '\n' ' ')

# Also check for uncommitted changes
UNCOMMITTED_FILES=$(git diff --name-only HEAD | grep -E '^docs/.*\.(md|mdx)$' | tr '\n' ' ')

# And new untracked files that match our pattern
UNTRACKED_FILES=$(git ls-files --others --exclude-standard | grep -E '^docs/.*\.(md|mdx)$' | tr '\n' ' ')

# Combine all files and remove duplicates
ALL_FILES="$CHANGED_FILES $UNCOMMITTED_FILES $UNTRACKED_FILES"
UNIQUE_FILES=$(echo "$ALL_FILES" | tr ' ' '\n' | sort | uniq | tr '\n' ' ')
CHANGED_FILES="$UNIQUE_FILES"

# Check if there are any changed files
if [ -z "$CHANGED_FILES" ]; then
echo -e "${GREEN}No changed files to analyze${NC}"
exit 0
fi

echo -e "${YELLOW}Running Vale on changed files: $CHANGED_FILES${NC}"

# Run Vale on the changed files
vale --config="$REPO_ROOT/.vale.ini" $CHANGED_FILES
else
# Run Vale on the specified files using glob pattern or list
echo -e "${YELLOW}Running Vale on files: $FILE_PATTERN${NC}"

# Handle the case where multiple files or patterns are specified
if [[ "$FILE_PATTERN" == *"*"* ]]; then
# Contains wildcard, use find to expand
FILES_TO_CHECK=$(find . -type f -path "$FILE_PATTERN" | tr '\n' ' ')

if [ -z "$FILES_TO_CHECK" ]; then
echo -e "${RED}No files found matching pattern: $FILE_PATTERN${NC}"
exit 1
fi

echo -e "${YELLOW}Found files: $FILES_TO_CHECK${NC}"
vale --config="$REPO_ROOT/.vale.ini" $FILES_TO_CHECK
else
# Could be a space-separated list of files or a single file
FILES_TO_CHECK=""

# Split the input by spaces and check each file/pattern
for file in $FILE_PATTERN; do
if [ -f "$file" ]; then
FILES_TO_CHECK="$FILES_TO_CHECK $file"
else
echo -e "${RED}Warning: File not found: $file${NC}"
fi
done

if [ -z "$FILES_TO_CHECK" ]; then
echo -e "${RED}No valid files found${NC}"
exit 1
fi

echo -e "${YELLOW}Checking files: $FILES_TO_CHECK${NC}"
vale --config="$REPO_ROOT/.vale.ini" $FILES_TO_CHECK
fi
fi

echo -e "${GREEN}Vale check complete${NC}"
9 changes: 9 additions & 0 deletions scripts/vale/test/changed_lines.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[
{
"filename": "docs/best-practices/json_type.md",
"changed_lines": [
11,
29
]
}
]
Loading