really odd problem extracting bibtex from a science publisher lol,
Mike Marchywka
marchywka at hotmail.com
Sun Oct 3 20:53:08 CEST 2021
I never went to this site before and it looked pretty normal
in the web browser,
https://cdnsciencepub.com/doi/abs/10.1139/o59-099
but when I ran "TooBib" to extract the citation info it looked
like it was stuck in infinte loop. However, the stack trace was
in html parsing which was supposed to work. I turned out the
html was over 20MB and rendered in lynx with
89k links lol,
lynx -dump -force-html junk/xxx | tail -n 100
89554. file://localhost/home/documents/cpp/proj/toobib/junk/xxx
89555. https://www.facebook.com/cdnsciencepub
89556. https://twitter.com/cdnsciencepub
89557. https://www.linkedin.com/company/canadian-science-publishing
89558. https://www.youtube.com/user/cdnsciencepub?feature=results_main
I'm not sure if my code woould have eventually worked, Zotero web
form returned an error after a little while. But, the doi is in
the link so I just redirect that site to an existing handler
for that case lol.
( this uses the crossref x-bibtex facility which seems to
consistently drop the journal info, I'm switching to parsing
their json output but right now it just uses the publisher as the journal
which works sometimes... )
% mjmhandler: toobib guesscdnscience<-handledoilink
% date 2021-10-03:11:04:30 Sun Oct 3 11:04:30 EDT 2021
% srcurl: https://cdnsciencepub.com/doi/abs/10.1139/o59-099
% citeurl: http://api.crossref.org/works/10.1139/o59-099/transform/application/x-bibtex
@article{1959_Bligh_Dyer_RAPID_METHOD_TOTAL_LIPID,
X_TooBib = {journal: ReWriteParse be.get(s)=Canadian Science Publishing be.get(dest)=},
author = {E. G. Bligh and W. J. Dyer},
doi = {10.1139/o59-099},
journal = {Canadian Science Publishing},
month = {aug},
number = {8},
pages = {911--917},
publisher = {Canadian Science Publishing},
title = {A {RAPID} {METHOD} {OF} {TOTAL} {LIPID} {EXTRACTION} {AND} {PURIFICATION}},
url = {https://doi.org/10.1139%2Fo59-099},
volume = {37},
year = {1959},
srcurl={https://cdnsciencepub.com/doi/abs/10.1139/o59-099},
xsrcurl={https://cdnsciencepub.com/doi/abs/10.1139/o59-099},
citeurl={http://api.crossref.org/works/10.1139/o59-099/transform/application/x-bibtex}
}
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the texhax
mailing list.