Comparing On-device OCR Frameworks Apple Vision and Google MLKit
30. Juni 2021
9 Minuten zum lesen
9 Min.
30. Juni 2021
9 Minuten zum lesen
9 Min.
21. Juni 2021
11 Minuten zum lesen
11 Min.
Currently a customer challenged us with the topic of displaying large and pretty detailed pdfs in the browser. The requirement was not only to just show it – this would be just possible out of the box – they also want to work with it. That means not only zooming in and out of the plan, they also want to easily switch between different pdfs or add markers to free selectable positions and keep all of the functionality.
As you have maybe experienced yourself, dealing with pdfs in the browser is not very convenient. Currently there is no easy way to show it inline, it will mostly open in a new tab or will be downloaded to your filesystem by clicking it. This is a way different behavior as on mobile devices, for example an Apple iPad. There you can just easily show the pdf and work with it, like you would do with an image. But could we not do it with an iframe? You will find a lot of other blog posts about why this would not be a good idea. But above all how would you add markers or enable other features? So we were developing a different solution, which we will talk about in the following paragraphs.
One of our first guesses was:
We then thought, if we already have high loading times, then at least the pictures should be sharp.
First we were pretty happy – except for the loading times. They increased a lot as the pdfs got larger in file size or file width. After a couple of tries of different conversion methods and compressions, we had to admit to ourselves that 40mb svgs aren’t pretty convenient as well.
The buzzword is tiling. It enables the possibility of only loading the necessary information or images, the user wants to see. This is the same way how different platforms show maps in the browser. How else would you be able to show the whole world in such a detail, if you don’t load only the necessary parts. For this need, there are multiple libraries on the market, also open source ones. We discussed the populars ones and decided to go with leaflet, which is a Javascript library for interactive maps – and to tell in their words, it is the leading one.
The basic system we started with consists of react.js and is enhanced through next.js. To make things in JavaScript a little bit easier we are also using TypeScript. We are doing mostly apps without a backend, instead we are using google firebase. There are some tutorials on the web, on how to set up such a system, if you are interested.
The flow through the app is, a user uploads a pdf in the web frontend and the app is handling the missing steps on his own, to show the pdf after around 1:15min (depending on the file size of the pdf), as tiled and high performance map in the browser.
If you are not seeing yourself in setting up such things – feel free to contact us, as we are really pleased to help you with any kind of project.
The uploaded pdf gets passed to a server, which converts the pdf to a png. Therefore we are using ImagMagick, which is pretty easy to handle. A command like:
magick convert -background white -alpha remove -colorspace RGB -units PixelsPerInch -depth 8 -density 400 example.pdf converted-example.png
does the magic. We added a couple of command-line options to get the best possible result – it’s mostly trial & error. Following the conversion, the tiling of the converted image takes place. Therefore we are using gdal2tiles-leaflet, which is a Python based script which helps us to generate raster image tiles for leaflet. The usage is pretty similar to ImageMagick, but the setup can be a bit tricky. Our recommendation for action is to try ImageMagick and gdal2tiles-leaflet in combination first locally on your machine and then move it to a server. The .py-script also needs a little setup (as described in the package description) and can than be called via terminal e.g. with the following command:
gdal2tiles.py -l -p raster -z 3-7 -w none converted-example.png
As you can see there are also multiple command-line options available, e.g. -z which describes the generated zoom levels for leaflet. gdal then generates a total of five zoom levels for the pdf. The outcome of this command are 5 folders, with multiple sub-folders. So be aware to handle a lot of files. At the end of the png to tiles conversion we are getting around 4-5k images in total.
To handle all those files and make them available to our app, we are saving them to the storage of google firebase. Pay attention to the security rules of firebase. You want to set them up, else all your files will be reachable for everyone. This can also be a little bit tricky, because you need to generate tokens, your app is aware of, so the url for the leaflet TileLayer is correct.
To view the tiled pdf in the browser we are not only using leaflet itself, we are also using react-leaflet as well as the corresponding types for TypeScript. This enables the possibility of using leaflet as a kind of a component in react, as it is providing bindings between itself and leaflet. To prevent you running into a common missing window issue, implement the component dynamically, e.g. like this:
const PlanTiles = React.useMemo(() => dynamic( () => import('../components/planTiles'), { ssr: false, } ), [planTilesUrl]);
And simply return the <PlanTiles /> with its props, if there are any.
The deps can help you to rerender your tiled image, e.g. if the planTilesUrl state is changing. This means for our app, if a user clicks on a button to change the current plan the state of the planTilesUrl is changing and the tiled plan gets rerendered with the updated url.
These steps are necessary to show the pdf as tiled plan in the browser. As we talked in the beginning, we wanted to enable some more features than just showing the plan. So we moved on to customizing the markers of leaflet. If you want to enable some features it is advisable to use react components as your makers, also to keep the overview of what is happening in your app. For customizing the markers, leaflet is providing a property in the marker.
<Marker icon={icon(someProp)} … />
To create a custom icon you can use newDivIcon from leaflet itself. We did it like this:
const icon = (someProps: Props) => new DivIcon({ html: renderToString( <ThemeProvider theme={useTheme()}> <YourComponent yourProps={yourProps}/> </ThemeProvider>, ), className: classes.Icon, });
The DivIcon allows you only to pass HTML, therefore you need to renderToString your component. To provide your material ui theme e.g. to the component, you have to enclose it in a <ThemeProvider … />, else your theme will not be available to your passed component. That’s it, you’re done!
With this knowledge you have a rough overview of how to convert and tile a pdf, show it in the browser and make some functional markers.
To give you an inspiration of what else is possible, we will talk about what we also got to work.
Hopefully we could inspire you a little or help you with one of your current questions with leaflet. If you have more specific questions about a particular issue, please let us know.
30. Juni 2021
9 Minuten zum lesen
9 Min.
21. Juni 2021
11 Minuten zum lesen
11 Min.