Porting Ruby library to TypeScript

Andrei Zhozhin

| 9 minutes

Sometimes you realize that existing tools do not work for you anymore as you’ve reached particular limits. It could be a complicated tech stack to run, unacceptable execution time, and lack of expertise to do customizations. When other alternatives do not work for you it is time to create a new thing. This is a story about porting the Ruby library to JavaScript/TypeScript to support ongoing projects.

Please note that the library porting process is a serious time investment as it requires understanding two stacks, exploring other libraries, do a lot of debugging and googling. But this is a pretty interesting and challenging endeavor. This is also my first serious contribution to Open Source.

The following story is about html-proofer library that allows to check generated HTML for validity and consistency, supporting:

  • Images - check the existence of images, support multiple srcset, alt attribute checks
  • Internal link validation - check internal page referential consistency
  • External link validation - main use-case detection of broken links
  • Scripts - checks scripts reachability
  • Favicon checks - check that referenced favicon is reachable
  • OpenGraph - checks opengraph referential consistency

This could be useful for the following static site validation (in the context of Continuous Integration):

  • generated project documentation (javadoc, pandoc, sphinx, …)
  • blog
  • site (statically generated)
  • HTML version of the book

With additional scripting you can scrape dynamic site and run validation on static HTML files like that:

wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains www.website.com \
     --no-parent \
     www.website.com

Reasons to port library to another language

  • Real need in several projects
  • Absence of good alternatives
  • Occasionally steep learning curve of the source language (Ruby)
  • Better availability of engineers for the target language (JavaScript)
  • Experience and Fun :)

Challenges

During porting of html-proofer I’ve faced several challenges:

  • Ruby has additional RegExp
    • \A (beginning of string) and \z (end of string) are implemented in Ruby but not in JavaScript
    • ^ (beginning of line) and $ (end of line) are implemented in both languages
  • Ruby uses multiple escaping techniques which looks a bit cryptic and requires extra attention while porting to JavaScript
    • %() and %Q() - double quotes, %q() - single quotes
    • %r[] - regexp
  • Ruby HTML parser (Nokogiri) supports both XPath and CSS selectors - in JavaScript, I had to use two different libraries (xmldom for xpath & cheerio for CSS) and parse the document two times to support porting process. Later I converted XPath selectors to CSS selectors and adjusted parsing logic accordingly, so it has simplified implementation
  • Ruby implementation has custom functions for XPath selectors to add case-insensitive behavior, in JavaScript XPath 1.0 solution was looking terrible and XPath 2.0 was not possible. Thankfully CSS selectors support case insensitivity out of the box
  • I’m not an expert neither in Ruby nor in JavaScript so I have to learn a lot for both of them during the porting process: starting from local env setup, build, tests, debug, packaging, and publishing
  • All auxiliary Ruby libraries should be replaced with JavaScript analogs, so it is about searching, comparing, selecting, and adopting code to 3rd party library API

Porting process

I did port in two phases:

  1. Convert Ruby to JavaScript (ES5) - at this stage library was working but was weakly typed
    1. Here I’ve found all 3rd party alternatives for auxiliary libs
    2. Ported tests first, then app code - it was almost like TDD (Red, Green, Refactor)
    3. Implemented CLI interface
  2. Convert JavaScript (ES5) to TypeScript - at this stage library become strongly typed (at least during compilation)
    1. I’ve defined all interfaces internal and external communications
    2. Reduced visibility for properties and functions (used public, protected, and private modifiers)
    3. Replaced all references to any with strongly typed definitions
    4. Build process adjustment to support dual CommonJS and ESM targets
    5. ESLint and SonarQube configuration

Having a strongly typed version allowed to remove some tests that were checking those input parameters has the proper shape, also it allowed to identify all weak places and fix logic.

Table with 3rd party libraries and their functionality

Ruby library JavaScript library Functionality
nokogiri cheerio HTML document parsing
mercenary commander Command line argument parsing
typhoeus axios HTTP Client for external checks
addressable urijs URL/URI parsing
rspec jest Testing library/framework
vcr axios-vcr Record and reply http calls
yell log4js Logging library
zeitwerk n/a Code loader, JavaScript has build in module loading
parallel n/a Parallel code execution, JavaScript run code asynchronously*
timecop n/a Time freeze for cache/tests, Not Implemented in current version
rainbow n/a Coloring console output, Implemented only partially

* - in current version only perform one external call performed at once

Importance of tests

The source library in Ruby has a great set of tests implemented in RSpec - this fact allowed me to start porting relatively easy from tests and provided a pretty good safety net throughout the process. The original set of specs has cover for Library and CLI part so from a functional perspective it was awesome, as I was able to convert tests one to one from RSpec to Jest preserving pretty much everything.

Ruby version:

describe "Links test" do  
...
  it "fails for broken hashes on the web when asked (even if the file exists)" do  
    broken_hash_on_the_web = File.join(FIXTURES_DIR, "links", "broken_hash_on_the_web.html")  
    proofer = run_proofer(broken_hash_on_the_web, :file)  
    expect(proofer.failed_checks.first.description).to(match(/but the hash 'no' does not/))  
  end
...
end

JavaScript version:

describe('Links test', () => {
...
 it('fails for broken hashes on the web when asked (even if the file exists)', async () => {  
    const broken_hash = path.join(FIXTURES_DIR, 'links', 'broken_hash_on_the_web.html')  
    const proofer = await createAndRunProofer(broken_hash, CheckType.FILE)  
    expect(proofer.failedChecks[0].description).toMatch(/but the hash 'no' does not/)  
 })
...
})

Importance of tooling

I’ve used JetBrains' Webstorm (for JavaScript) and Rubymine (for Ruby) at the same time. This provided a great possibility to debug same test case in two IDEs side by side to identify issues quickly. Only this single possibility saved me a lot of time.

During the first porting phase (Ruby to JavaScript) Webstorm helped a lot with its code inspections, highlighting potential type mismatches it was not able to detect all issues, and even if the code was ‘compilable’ it didn’t work due to some interface mismatches that happen in runtime, so debugging was the only option to identify the problem. But after porting from JavaScript to TypeScript WebStorm was able to find all remaining inconsistencies. Also, features for refactoring that worked across the whole codebase were amazing and worked quickly and reliably.

I suspect the same developer experience could be archived with VSCode + Ruby+JavaScript/TypeScript plugins as now it is pretty powerful. So you don’t need to have JetBrains licenses.

Profiling

After the first version of the library was ready I tried to run it on a real project and it was running VERY slow and has detected a LOT of issues, because of some minor bugs. I’ve used VSCode to run profiling and discovered that for highly linked (every page has approximately 200 links to other internal pages) static sites my version of the library was parsing all internal links many many times, so for the site with 50 pages it was working 5 minutes. I was able to identify the problem with multiple parsing of HTML files and it started to work for several seconds for the same input. It is still far from perfect but now it works in sensible time.

Use-cases

There are two main use-cases (as for the original library):

  • checking of static site integrity (Links, Images, Scripts, Favicons, OpenGraph) - CLI interface, it is similar to the original ruby library
  • possibility to implement custom checks - library API

Example of custom check that looking for anchor tag and validating if mailto referencing octocat@github.com, if yes - it would report check failure

Code for custom check:

const {HTMLProofer, Check, DummyReporter} = require('html-proofer.js')  
  
class MailToOctocat extends Check {  
  internalRun() {  
    for (const node of this.html.css('a')) {  
      const link = this.createElement(node)  
  
      if (link.isIgnore()) {  
        continue  
      }  
  
      if (this.isMailtoOctocat(link)) {  
        this.addFailure(`Don't email the Octocat directly!`, link.line)  
      }  
    }  
  }  
  
  isMailtoOctocat(link) {  
    return link.url.rawAttribute === 'mailto:octocat@github.com'
  }  
}

Main method to call

// DummyReporter does not output anything
// We just need check results
let reporter = new DummyReporter()
let opts = {  
  checks: [MailToOctocat],
}  
  
main = async () => {  
  // create proofer for directory checking
  const proofer = HTMLProofer.checkDirectory('<target directory>', opts, reporter) 
  // run all checks
  await proofer.run()
  // all checks are in `failedChecks` property
  console.log(proofer.failedChecks)
}  
  
main()

Output

Running 1 check (MailToOctocat) in <target directory> on *.html files...

Ran on 1 file!

HTML-Proofer found 1 failure!
[
  Failure {
    path: '<target directory>/mailto_octocat.html',
    checkName: 'MailToOctocat',
    description: "Don't email the Octocat directly!",
    line: 3,
    status: null,
    content: null
  }
]

Checking code Quality

After porting Ruby to JavaScript line by line it was time to check how new code quality. I’ve utilized the JavaScript liner - ESLint which is covering both JavaScript and TypeScript. It helped a lot to fix obvious style issues and detect some potential bugs.

ESLint was configured in strict mode (tsconfig.json) to enable all possible check (Enable all strict type-checking options)

{  
  "compilerOptions": {  
...
    "strict": true
...
  }
}

As a second step, I’ve added SonarQube scan which is free as a service in SonarCloud for OpenSource projects. SonarQube has detected another set of more sophisticated potential issues. SonarQube also calculates Cyclomatic Complexity and Cognitive Complexity and raises issues if the code is too complex.

Cognitive Complexity is a measure of how difficult a unit of code is to intuitively understand. Unlike Cyclomatic Complexity, which determines how difficult your code will be to test, Cognitive Complexity tells you how difficult your code will be to read and understand.

Such checks forced to do refactoring by extracting methods and rewriting functions to simplify flow. SonarCloud project page for html-proofer.js: SonarCloud project report

Invested time

I’ve calculated the approximate time I’ve invested into this project:

  • 1st phase (Ruby to JavaScript) - 30 hours
  • 2nd phase (JavaScript to TypeScript) - 20 hours

There are still some minor things to do, but in general, I could consider porting process complete.

Conclusion

This was a very interesting and exciting journey in porting the library from one language to another. I’ve learned a lot of new things and I feel that I better understand Ruby and JavaScript/TypeScript stacks now. Also, I hope that this Open Source project would help other people to cover their needs at work and in pet projects.

Related content

Extending Azure CLI
Elliptic curve cryptography