rOpenSci | Blog

All posts (Page 92 of 130)

Support for hOCR and Tesseract 4 in R

Earlier this month we released a new version of the tesseract package to CRAN. This package provides R bindings to Google’s open source optical character recognition (OCR) engine Tesseract.

Two major new features are support for HOCR and support for the upcoming Tesseract 4.

🔗

hOCR output

Support for HOCR output was requested by one of our users on Github. The ocr() function gains a parameter HOCR which allows for returning results in hOCR format:

...

Introducing the 2018 rOpenSci Research Fellows!

rOpenSci’s mission is to enable and support a thriving community of researchers who embrace open and reproducible research practices as part of their work. Since our inception, one of the mechanisms through which we have supported the community is by developing high-quality open source tools that lower barriers to working with scientific data. Equally important to our mission is to build capacity and promote researchers who are engaged in such practices within their disciplinary communities. This fellowship program is a unique opportunity for us to enable such individuals to have a bigger voice in their communities....

.rprofile: Julia Stewart Lowndes

Julia Stewart LowndesZgotmplZ

     style="  object-fit: cover; object-position: center;  height: 250px;  width: 250px; margin-right: 15px"
/> 
            <p>
              <em>Dr. Julia Stewart Lowndes [@juliesquid on Twitter] is the Science Program Lead for the Ocean Health Index and works at the National Center for Ecological Analysis and Synthesis. She and Sean Kross discussed how data science, open science, and community can help reproducibility in research.</em>
            </p>
            </figure>
          </div>

[This interview occurred at the 2017 rOpenSci unconference]

SK: I’m Sean Kross, I’m the CTO of the Johns Hopkins Data Science Lab. Today I’m interviewing Julia Stewart Lowndes. Julia, what is your current preferred job title?

...

Apply to attend rOpenSci unconf 2018!

For a fifth year running, we are excited to announce the rOpenSci unconference, our annual event loosely modeled on Foo Camp. rOpenSci unconferences have a rich history. You can get a feel for them by reading collected stories about people and projects from unconf17.

We’re organizing unconf18 to bring together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a couple of days to hack on various projects and generally enrich our community. The agenda is mostly decided during the unconference itself. Past projects have related to open data, data visualization, data publication and open science using R. This event is unlike many other unconferences in that it is primarily invite-only, with a few spots set aside for self-nominations from the community at large. That’s you!

...

The prequel to the drake R package

The drake R package is a pipeline toolkit. It manages data science workflows, saves time, and adds more confidence to reproducibility. I hope it will impact the landscapes of reproducible research and high-performance computing, but I originally created it for different reasons. This post is the prequel to drake’s inception. There was struggle, and drake was the answer.

🔗

Dissertation frustration

Sisyphus

Sisyphus

My dissertation project was intense. The final computational challenge was to analyze multiple genomics datasets using an emerging method and its competitors. Even with GPU computing, which shrank days of runtime down to hours, the full battery of Markov chain Monte Carlo runs took several weeks from start to finish. I organized my workflow as an R package, and I worked in a loop:

...

Working together to push science forward

Happy rOpenSci users can be found at